Plugin System
Opsmate has a handful of built-in tools you can use out of the box, however there are always customisations and extensions you want to build for your own use cases.
Opsmate comes with a plugin system to allow you to build your own tools and use them in your automation. In fact it is used by the Opsmate CLI tool and the API/Web service.
In this cookbook we will show you how to write and author your own plugins and use it in your automation via writing a Prometheus query plugin.
Prerequisites¶
- You have a OpenAI API key, otherwise Anthropic API key is also supported, as Opsmate is LLM provider agnostic.
- You have Opsmate installed - see getting started for more details.
- You have kind installed, as Opsmate uses kind to run local Kubernetes cluster.
- You have kubectl installed for interacting with the local Kubernetes cluster.
Setup¶
First, let's install our required packages and set our API keys.
import getpass
import os
def _set_if_undefined(var: str) -> None:
if os.environ.get(var):
return
os.environ[var] = getpass.getpass(var)
_set_if_undefined("OPENAI_API_KEY") # Feel to comment this out and use Anthropic API key instead
_set_if_undefined("ANTHROPIC_API_KEY")
Then we will spin up a local k8s cluster and install the Prometheus Operator onto the local cluster.
! kind create cluster --name opsmate-plugin-test
Creating cluster "opsmate-plugin-test" ... ✓ Ensuring node image (kindest/node:v1.31.2) 🖼7l ✓ Preparing nodes 📦 7l ✓ Writing configuration 📜7l ✓ Starting control-plane 🕹️7l ✓ Installing CNI 🔌7l ✓ Installing StorageClass 💾7l Set kubectl context to "kind-opsmate-plugin-test" You can now use your cluster with: kubectl cluster-info --context kind-opsmate-plugin-test Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
# make sure that you are running on the correct cluster
! kubectl config current-context
kind-opsmate-plugin-test
Now we will install the LGTM stack on the cluster.
%%capture
%%bash
(
rm -rf /tmp/kube-prometheus
git clone https://github.com/prometheus-operator/kube-prometheus --depth 1 /tmp/kube-prometheus
cd /tmp/kube-prometheus
kubectl apply --server-side -f manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
)
The prometheus server is running on :9090 against the cluster local IP address thus likely unavailable in the host space. To make it available you can run
kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
Our First Plugin¶
The ./plugins/prom.py
file contains the plugin we wrote for this cookbook. The job it tries to accomplish is reasonably complex:
- It extracts the datetime range from the user's query
- It queries the Prometheus API to get the metrics within the time range
- It returns the metrics to a string and represent it as a markdown table
! cat ./plugins/prom.py
from opsmate.dino.types import ToolCall, PresentationMixin from pydantic import Field, PrivateAttr from typing import Optional, Any from httpx import AsyncClient from opsmate.dino import dino from opsmate.dino.types import Message from opsmate.tools.datetime import DatetimeRange, datetime_extraction from opsmate.plugins import auto_discover import pandas as pd import matplotlib.pyplot as plt from datetime import datetime DEFAULT_ENDPOINT = "http://localhost:9090" DEFAULT_PATH = "/api/v1/query_range" class PromQuery(ToolCall[dict[str, Any]], DatetimeRange, PresentationMixin): """ A tool to query metrics from Prometheus """ query: str = Field(description="The prometheus query") step: str = Field( description="Query resolution step width in duration format or float number of seconds", default="15s", ) y_label: str = Field( description="The y-axis label of the time series based on the query", default="Value", ) x_label: str = Field( description="The x-axis label of the time series based on the query", default="Timestamp", ) title: str = Field( description="The title of the time series based on the query", default="Time Series Data", ) _client: AsyncClient = PrivateAttr(default_factory=AsyncClient) @property def headers(self): return { "Content-Type": "application/x-www-form-urlencoded", "User-Agent": "opsmate prometheus tool", } async def __call__(self, context: dict[str, Any] = {}): endpoint = context.get("endpoint", DEFAULT_ENDPOINT) path = context.get("path", DEFAULT_PATH) response = await self._client.post( endpoint + path, data={ "query": self.query, "start": self.start, "end": self.end, "step": self.step, }, headers=self.headers, ) return response.json() class Config: underscore_attrs_are_private = True def markdown(self): ... def time_series(self): values = self.output["data"]["result"][0]["values"] timestamps = [datetime.fromtimestamp(ts) for ts, _ in values] measurements = [float(val) for _, val in values] df = pd.DataFrame({"timestamp": timestamps, "measurement": measurements}) plt.figure(figsize=(12, 6)) plt.plot(df["timestamp"], df["measurement"], marker="o") plt.grid(True) plt.title(f"{self.title} - {self.query}") plt.xlabel(self.x_label) plt.ylabel(self.y_label) plt.xticks(rotation=45) plt.tight_layout() plt.show() @auto_discover( name="prometheus_query", description="A tool to query metrics from Prometheus", version="0.0.1", author="Micky", ) @dino( model="gpt-4o", response_model=PromQuery, tools=[datetime_extraction], ) async def prometheus_query(query: str, extra_context: str = ""): """ You are a world class SRE who excels at querying metrics from Prometheus You are given a query in natural language and you need to convert it into a valid Prometheus query """ return [ Message.user(content=extra_context), Message.user(content=query), ]
Discovering plugins¶
In the plugin code if you have a keen eye you probably have already noticed the @auto_discover
decorator. This is the key to discover the plugin.
By default all the tools are discovered automatically, but for LLM functions you will need to explicitly mark it as discoverable via the @auto_discover
decorator.
To make the plugin discovered from where-ever your current python path is, you can execute the discovery via the following snippet:
from opsmate.plugins import PluginRegistry as plugins
plugins.clear()
plugins.discover("./plugins", ignore_conflicts=True)
2025-02-21 17:13:07 [debug ] loading builtin tools 2025-02-21 17:13:07 [debug ] loading builtin tools from builtin_module=opsmate.tools 2025-02-21 17:13:07 [debug ] loading dtool dtool=ACITool 2025-02-21 17:13:07 [debug ] loading dtool dtool=FileAppend 2025-02-21 17:13:07 [debug ] loading dtool dtool=FileDelete 2025-02-21 17:13:07 [debug ] loading dtool dtool=FileRead 2025-02-21 17:13:07 [debug ] loading dtool dtool=FileWrite 2025-02-21 17:13:07 [debug ] loading dtool dtool=FilesFind 2025-02-21 17:13:07 [debug ] loading dtool dtool=FilesList 2025-02-21 17:13:07 [debug ] loading dtool dtool=GithubCloneAndCD 2025-02-21 17:13:07 [debug ] loading dtool dtool=GithubRaisePR 2025-02-21 17:13:07 [debug ] loading dtool dtool=HttpCall 2025-02-21 17:13:07 [debug ] loading dtool dtool=HttpGet 2025-02-21 17:13:07 [debug ] loading dtool dtool=HttpToText 2025-02-21 17:13:07 [debug ] loading dtool dtool=KnowledgeRetrieval 2025-02-21 17:13:07 [debug ] loading dtool dtool=ShellCommand 2025-02-21 17:13:07 [debug ] loading dtool dtool=SysEnv 2025-02-21 17:13:07 [debug ] loading dtool dtool=SysStats 2025-02-21 17:13:07 [debug ] loading dtool dtool=current_time 2025-02-21 17:13:07 [debug ] loading dtool dtool=datetime_extraction 2025-02-21 17:13:07 [info ] adding the plugin directory to the sys path plugin_dir=/home/jingkaihe/workspace/opsmate/docs/cookbooks/plugins 2025-02-21 17:13:07 [info ] loading plugin file plugin_path=./plugins/prom.py 2025-02-21 17:13:07 [info ] Discovered plugin prometheus_query 2025-02-21 17:13:07 [debug ] loading dtool dtool=PromQuery 2025-02-21 17:13:07 [debug ] loading dtool dtool=ToolCall 2025-02-21 17:13:07 [debug ] loading dtool dtool=datetime_extraction 2025-02-21 17:13:07 [warning ] tool already exists conflict_source=/home/jingkaihe/workspace/opsmate/opsmate/tools/__init__.py current_source=/home/jingkaihe/workspace/opsmate/docs/cookbooks/plugins/prom.py tool=datetime_extraction 2025-02-21 17:13:07 [info ] loaded plugin file plugin_path=./plugins/prom.py
With the prometheus_query plugin discovered, we can now use it to query the Prometheus API.
prom_query_builder = plugins.get_plugin("prometheus_query")
# make sure the plugin is discovered
assert prom_query_builder is not None
# execute the plugin to build a query
query = await prom_query_builder.execute("number of pods within the cluster over the last 10 minutes")
print(query)
2025-02-21 17:13:24 [info ] Tool called tool={"output":"2025-02-21T17:13:24Z"} 2025-02-21 17:13:25 [info ] Tool called tool={"text":"last 10 minutes","output":{"start":"2025-02-21T17:03:24Z","end":"2025-02-21T17:13:24Z"}} start='2025-02-21T17:03:24Z' end='2025-02-21T17:13:24Z' query='count(kube_pod_info)' step='15s' y_label='Number of Pods' x_label='Timestamp' title='Number of Pods in the Cluster over the Last 10 Minutes' output=None
As you have already seen The query builder is capable of figuring out the time range from the query, which is very much the Achilles heel of the LLM. In the specific example the query builder is able to figure out the time range from over the last 10 minutes
via the datetime_extraction
tool, which is part of the built-in tools.
Now with the query built from the query builder, we can now execute the query to get the result.
The custom method time_series()
is a method that is defined in the plugin class to plot the time series data.
await query.run()
query.time_series()
Here we will create an async function that wraps up the above procedure.
async def text_to_time_series(query: str):
query = await prom_query_builder.execute(query)
await query.run()
query.time_series()
await text_to_time_series("CPU utilisation of the nodes between 10 mins ago to 5 mins ago")
2025-02-21 17:15:03 [info ] Tool called tool={"output":"2025-02-21T17:15:03Z"} 2025-02-21 17:15:03 [info ] Tool called tool={"output":"2025-02-21T17:15:03Z"} 2025-02-21 17:15:03 [info ] Tool called tool={"output":"2025-02-21T17:15:03Z"} 2025-02-21 17:15:03 [info ] Tool called tool={"output":"2025-02-21T17:15:03Z"} 2025-02-21 17:15:03 [info ] Tool called tool={"output":"2025-02-21T17:15:03Z"} 2025-02-21 17:15:05 [info ] Tool called tool={"text":"CPU utilisation of the nodes between 10 mins ago to 5 mins ago","output":{"start":"2025-02-21T17:05:03Z","end":"2025-02-21T17:15:03Z"}}
Cleanup¶
You can delete the cluster via running the following command:
! kind delete cluster --name opsmate-plugin-test
Deleting cluster "opsmate-plugin-test" ... Deleted nodes: ["opsmate-plugin-test-control-plane"]
Known Limitations of The Existing Plugin System¶
There are a few known limitations of the existing plugin system, namely:
1 The plugin system does not support (well no documented support) for installing extra Python dependencies. There are a few approaches to it, e.g. PythonVirtualenvOperator from the Airflow project, or embedding the the entire package system into a enterprise-ish bundle like Chef. 2. Lack other language runtime (e.g. Golang, NodeJS) support.
All things considered a coherent solution is needed to make the above possible.
Conclusion¶
In this cookbook we have shown you how to write and author your own Opsmate plugin and use it in your automation via writing a text-to-time-series-graph plugin.
Obviously in the example the plugin is very naive with a few shortcomings, notably:
- Retrieval of the available metrics from the existing metrics system.
- Optimised prompt that increases the PromQL accuracy and precision.
- Only works with a single time series.
That being said the whole purpose is to show you the ropes ;)