AIOps Stack for Server Monitoring with Best Tools 2025

set up an AIOps stack for server monitoring

Setting Up AIOps Tools for Server Monitoring

By Mila Harris
October 28, 2025

In this tutorial, we want to discover how to set up an AIOps stack for server monitoring. AIOps stack is a combination of tools that use AI and automation to monitor, analyze, and manage servers. It is a great tool for detecting issues, predicting failures, and automating responses. In this guide, we will use the following tools for the AIOps stack:

Metrics: Prometheus, Node Exporter, and Blackbox Exporter.
Logs: Loki and Promtail.
Visualization and alerting UI: Grafana.
Alert routing: Alertmanager.
AIOps: A small Python service that fetches Prometheus metrics, detects anomalies, and triggers Alertmanager via webhook.

In this guide, we use Ubuntu 24.04 server from PerLod Hodting, where you can find support for server monitoring with AIOps.

Table of Contents

Requirements for AIOps Stack for Server Monitoring – AIOps Tools

To complete the guide steps, you need a fresh Ubuntu 24.04 VM with 4 vCPU, 8 GB RAM, and 100 GB disk. Also, you must open the required firewall ports, including:

Prometheus: 9090
Alertmanager: 9093
Loki: 3100
Grafana: 3000
Port 22
And 80/443, if you use a reverse proxy.

Remember to set the correct timezone on your server:

sudo timedatectl set-timezone Asia/Dubai

Once you are done, proceed to the next step to install Docker and Docker Compose, which are used for setting up an AIOps stack.

Install Docker and Docker Compose For Setting up an AIOps Stack

Run the system update and install the required packages with the commands below:

sudo apt update
sudo apt install ca-certificates curl gnupg -y

Add Docker GPG key and repository to your server with the following commands:

sudo install -m 0755 -d /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu noble stable" \
| sudo tee /etc/apt/sources.list.d/docker.list >/dev/null

Again, run the system update and install Docker and Docker Compose with:

sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y

You can add your user to the Docker group with the command below:

sudo usermod -aG docker $USER

Log out and log in again to apply the changes.

Create AIOps Stack Directory

You must create the AIOps stack directory and the required tools path in it. To do this, you can run the command below:

sudo mkdir -p /opt/aiops/{prometheus,alertmanager,grafana-provisioning/{datasources,dashboards},loki,promtail,blackbox,anomaly-detector}

Set the correct ownership for the AIOps stack directory with:

sudo chown -R $USER:$USER /opt/aiops

Switch to the AIOps stack directory:

cd /opt/aiops

Prometheus Configuration for AIOps Stack

Prometheus is the main monitoring engine. The first step is to configure Prometheus server monitoring by creating the Prometheus YAML file with the following command:

sudo nano /opt/aiops/prometheus/prometheus.yml

Add the following configuration to the file:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

rule_files:
  - /etc/prometheus/rules/*.yml

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["prometheus:9090"]

  - job_name: "node-exporters"
    static_configs:
      # Add your monitored servers' IPs (with :9100) here
      - targets: ["10.0.0.11:9100","10.0.0.12:9100"]

  - job_name: "blackbox-http"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://example.com
          - https://your-api.example
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox:9115

Also, you can create a simple alert rule in the Prometheus rule file:

sudo nano /opt/aiops/prometheus/rules/infra.yml

Add the following alert rule to the file:

groups:
- name: infra
  rules:
  - alert: NodeDown
    expr: up{job="node-exporters"} == 0
    for: 2m
    labels: {severity: critical}
    annotations:
      summary: "Node down: {{ $labels.instance }}"
      description: "No scrape data for 2m."

  - alert: HighCPU
    expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
    for: 5m
    labels: {severity: warning}
    annotations:
      summary: "High CPU on {{ $labels.instance }}"
      description: "CPU usage >85% for 5m."

  - alert: DiskFilling
    expr: (node_filesystem_size_bytes{fstype!~"tmpfs|overlay"} - node_filesystem_free_bytes{fstype!~"tmpfs|overlay"}) 
          / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"} > 0.9
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "Disk >90% on {{ $labels.instance }}"
      description: "Filesystem filling up."

Tip: To learn more about Prometheus Linux server monitoring, you can check this guide on Monitoring a Linux host using Prometheus.

Alertmanager Configuration for AIOps Stack

The next step is to set up the Alertmanager for server monitoring, which handles notifications from Prometheus. Create the Alertmanager YAML file with:

sudo nano /opt/aiops/alertmanager/alertmanager.yml

Add the following configuration to the file, you can replace SMTP with real credentials or remove the email if not needed yet:

route:
  receiver: default
  group_by: ["alertname", "instance"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h

receivers:
- name: default
  email_configs:
    - to: "[email protected]"
      from: "[email protected]"
      smarthost: "smtp.example.com:587"
      auth_username: "[email protected]"
      auth_identity: "[email protected]"
      auth_password: "CHANGE_ME"
  webhook_configs:
    - url: "http://anomaly-detector:8080/alertmanager"   # our AIOps webhook
      send_resolved: true

Loki and Promtail Log Configuration for AIOps Stack

Loki is a log aggregation system, and Promtail is its log collector. Promtail gathers logs from your server and sends them to Loki, which makes it easy to search logs in Grafana alongside metrics.

Create the Loki YAML file with the following command:

sudo nano /opt/aiops/loki/config.yml

Add the following configuration to it:

auth_enabled: false
server:
  http_listen_port: 3100
common:
  path_prefix: /loki
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
schema_config:
  configs:
  - from: 2024-01-01
    store: boltdb-shipper
    object_store: filesystem
    schema: v13
    index:
      prefix: index_
      period: 24h
storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/boltdb-cache
  filesystem:
    directory: /loki/chunks
limits_config:
  ingestion_burst_size_mb: 64
  ingestion_rate_mb: 32
  max_cache_freshness_per_query: 10m
chunk_store_config:
  max_look_back_period: 720h
table_manager:
  retention_deletes_enabled: true
  retention_period: 720h

Also, create the Promtail file with the command below:

sudo nano /opt/aiops/promtail/config.yml

Add the following config to the file:

server:
  http_listen_port: 9080
positions:
  filename: /positions.yml
clients:
  - url: http://loki:3100/loki/api/v1/push
scrape_configs:
  - job_name: system-logs
    static_configs:
      - targets: [localhost]
        labels:
          job: varlogs
          host: ${HOSTNAME}
          __path__: /var/log/*.log

Configure Blackbox Exporter for AIOps Stack

Blackbox Exporter checks external endpoints like APIs or websites using HTTP probes, which helps you monitor the availability and response of your web services.

At this point, you can create the Blackbox exporter YAML file with the following command:

sudo nano /opt/aiops/blackbox/blackbox.yml

Add the following configuration to the file:

modules:
http_2xx:
prober: http
http:
preferred_ip_protocol: ip4
no_follow_redirects: false
fail_if_not_ssl: false
fail_if_body_not_matches_regexp:
- ".+"

Grafana Configuration for AIOps Stack

You can connect Grafana to Prometheus and Loki as data sources, and load dashboards to visualize server performance, logs, and alerts in one place.

To create the Grafana data sources YAML file, run the command below:

sudo nano /opt/aiops/grafana-provisioning/datasources/datasource.yml

Add the following configuration to the file:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy
    isDefault: true
  - name: Loki
    type: loki
    url: http://loki:3100
    access: proxy
    jsonData:
      maxLines: 1000

Optional Note: You can drop JSON dashboards in grafana-provisioning/dashboards/ and add a dashboards.yml provisioning file.

Create the Garfana dashboards YAML file with:

sudo nano /opt/aiops/grafana-provisioning/dashboards/dashboards.yml

Add the following config to the file:

apiVersion: 1
providers:
  - name: 'Default'
    orgId: 1
    folder: 'AIOps'
    type: file
    options:
      path: /etc/grafana/dashboards

AIOps Microservice: Anomaly Detector Configuration

This custom Python microservice analyzes Prometheus metrics in real time using STL decomposition and z-score detection to identify anomalies. When it detects unusual behavior, it sends a synthetic alert to Alertmanager automatically.

Create the requirements file with the following command:

sudo nano /opt/aiops/anomaly-detector/requirements.txt

Add the following requirements to the file:

flask==3.0.3
requests==2.32.3
pandas==2.2.3
numpy==2.1.1
statsmodels==0.14.3

Then, create the Anomaly detector script file with:

sudo anno /opt/aiops/anomaly-detector/app.py

Add the following script to the file:

from flask import Flask, request, jsonify
import os, time, requests, json
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from statsmodels.tsa.seasonal import STL

PROM = os.getenv("PROM_URL", "http://prometheus:9090")
ALERTMAN = os.getenv("ALERTMAN_URL", "http://alertmanager:9093/api/v2/alerts")
QUERY = os.getenv("PROM_QUERY", '100 - (avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))*100)')

app = Flask(__name__)

def fetch_series(minutes=120, step="30s"):
    end = int(time.time())
    start = end - minutes*60
    url = f"{PROM}/api/v1/query_range"
    r = requests.get(url, params={"query": QUERY, "start": start, "end": end, "step": step}, timeout=30)
    r.raise_for_status()
    return r.json()

def stl_anomaly(values):
    # values: list of [timestamp, value]
    if len(values) < 60:
        return None
    ts = pd.Series([float(v[1]) for v in values])
    # STL requires a period; 60 points ~ 30 min if step=30s; tune as needed
    stl = STL(ts, period=60, robust=True).fit()
    resid = stl.resid
    z = (resid - resid.mean()) / (resid.std() + 1e-9)
    # flag last point if it is an outlier
    if abs(z.iloc[-1]) > 3.5:
        return float(ts.iloc[-1]), float(z.iloc[-1])
    return None

def push_alert(instance, value, zscore):
    payload = [{
        "labels": {
            "alertname": "AIOpsDetectedAnomaly",
            "severity": "warning",
            "instance": instance
        },
        "annotations": {
            "summary": f"AIOps anomaly on {instance}",
            "description": f"Value={value:.2f}, z-score={zscore:.2f} on query: {QUERY}"
        }
    }]
    rr = requests.post(ALERTMAN, data=json.dumps(payload), headers={"Content-Type":"application/json"}, timeout=10)
    rr.raise_for_status()

@app.route("/run", methods=["POST","GET"])
def run():
    data = fetch_series()
    if data.get("status") != "success":
        return jsonify({"status":"error","msg":data}), 500
    result = data["data"]["result"]
    anomalies = []
    for series in result:
        metric = series.get("metric", {})
        inst = metric.get("instance","unknown")
        values = series.get("values", [])
        res = stl_anomaly(values)
        if res:
            val, z = res
            push_alert(inst, val, z)
            anomalies.append({"instance":inst,"value":val,"z":z})
    return jsonify({"status":"ok","anomalies":anomalies})

# webhook endpoint (optional: receive resolved alerts or enrich)
@app.route("/alertmanager", methods=["POST"])
def inbound():
    _ = request.json
    return jsonify({"status":"received"}), 200

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

Now you must create the Docker file in the Anomaly detector:

sudo nano /opt/aiops/anomaly-detector/Dockerfile

Add the following config to the file:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
ENV PROM_URL=http://prometheus:9090
ENV ALERTMAN_URL=http://alertmanager:9093/api/v2/alerts
ENV PROM_QUERY=100 - (avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))*100)
EXPOSE 8080
CMD ["python","/app/app.py"]

Set up AIOps Stack Docker Compose File

At this point, you can easily create the whole stack in a Docker Compose file. Create the file with the command below:

sudo nano /opt/aiops/docker-compose.yml

Add the following config to the file:

services:
  prometheus:
    image: prom/prometheus:v2.55.1
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.enable-lifecycle"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules:/etc/prometheus/rules:ro
      - prom_data:/prometheus
    ports: ["9090:9090"]
    restart: unless-stopped

  alertmanager:
    image: prom/alertmanager:v0.27.0
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
    ports: ["9093:9093"]
    restart: unless-stopped

  loki:
    image: grafana/loki:2.9.8
    command: ["-config.file=/etc/loki/config.yml"]
    volumes:
      - ./loki/config.yml:/etc/loki/config.yml:ro
      - loki_data:/loki
    ports: ["3100:3100"]
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.8
    command: ["-config.file=/etc/promtail/config.yml"]
    volumes:
      - ./promtail/config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log
      - promtail_positions:/positions.yml
    restart: unless-stopped

  blackbox:
    image: prom/blackbox-exporter:v0.25.0
    command: ["--config.file=/etc/blackbox/blackbox.yml"]
    volumes:
      - ./blackbox/blackbox.yml:/etc/blackbox/blackbox.yml:ro
    ports: ["9115:9115"]
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.2.0
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=ChangeMe!
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana-provisioning/datasources:/etc/grafana/provisioning/datasources:ro
      - ./grafana-provisioning/dashboards:/etc/grafana/dashboards
      - ./grafana-provisioning/dashboards/dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml:ro
    ports: ["3000:3000"]
    restart: unless-stopped

  anomaly-detector:
    build: ./anomaly-detector
    environment:
      - PROM_URL=http://prometheus:9090
      - ALERTMAN_URL=http://alertmanager:9093/api/v2/alerts
      # You can override the query to any time-series you want to monitor with AIOps
      - PROM_QUERY=100 - (avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))*100)
    ports: ["8080:8080"]
    restart: unless-stopped

volumes:
  prom_data:
  grafana_data:
  loki_data:
  promtail_positions:

Once you are done, you can run the anomaly detector with the following commands:

docker compose build anomaly-detector
docker compose up -d

Check if the service is up and running with the command below:

docker compose ps

Install Node Exporter on Each Server To Monitor

At this point, you must set up Node exporter on each server target you want to monitor. It collects CPU, memory, disk, and network metrics and exposes them on port 9100, and then Prometheus scrapes these metrics for analysis and alerting.

To do this, you can run the following commands:

NODE_EXPORTER_VERSION=1.8.2
cd /tmp

curl -LO https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz

tar xzf node_exporter-.tar.gz sudo mv node_exporter-/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /usr/sbin/nologin nodeexp || true

Then, create the systemd service for Node exporter with the command below:

sudo tee /etc/systemd/system/node_exporter.service >/dev/null <<'EOF'
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
User=nodeexp
Group=nodeexp
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

Enable and start the service with the command below:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Verify the 9100 port:

sudo ss -lntp | grep 9100

You must be sure the Prometheus server can reach server_ip:9100.

Verifying AIOps Stack and Creating Dashboards

It is recommended to confirm that every component of the AIOps stack is running correctly. To verify your setup, open the Prometheus Web UI by navigating to the URL below:

http://your-server-ip:9090

From there, go to Status and then Targets, make sure all targets show as UP. Also, check the Blackbox Exporter to confirm that your target pages are listed and also show as UP.

For Loki, navigate to the following URL:

http://your-server-ip:3100/ready

It must return ready, which means the service is active.

Then, navigate to the Grafana UI and log in with the default credentials:

http://your-server-ip:3000

In Grafana’s Explore section, select Prometheus and run the query up to confirm metrics are available, then switch to Loki and run the query {job=”varlogs”} to verify logs are being collected correctly.

To quickly visualize your metrics, you can create a dashboard in Grafana.

Open Grafana, go to Dashboards and then Import, and enter the dashboard ID 1860 (Node Exporter Full) from Grafana.com to get a detailed view of your servers’ performance.

Alternatively, you can place your own JSON dashboard files in the /opt/aiops/grafana-provisioning/dashboards/ directory, and they’ll automatically appear under the AIOps folder when Grafana starts.

Test Alerts and Validate AIOps Anomalies

After setting up the AIOps stack, it’s important to test that alerts and anomaly detection are working correctly. You can simulate different scenarios to confirm that Prometheus, Alertmanager, and the AIOps anomaly detector respond as expected.

For example, stop the Node Exporter service on a monitored server to test a NodeDown alert after about two minutes.

You can also manually test the AIOps anomaly detection by running the command below:

curl -X POST http://your-server-ip:8080/run

If the last metric point is an outlier, it sends an AIOpsDetectedAnomaly alert to Alertmanager via webhook.

You can automate this process with a cron job, sidecar container, or Grafana alert rule to call /run periodically and keep continuous anomaly checks active.

That’s it, you are done with setting up the AIOps Stack for server monitoring.

FAQs

What is AIOps, and why is it important?

AIOps uses machine learning and data analysis to automatically detect problems, find their causes, and connect related events. It helps reduce unnecessary alerts, fix issues faster, and predict potential system problems before they happen.

How does the AIOps anomaly detector work?

The detector regularly checks Prometheus metrics, uses STL and z-score analysis to find unusual patterns, and flags any outliers. When it detects an anomaly, it automatically sends an alert to Alertmanager, which can notify your team through email, Slack, or a webhook.

How can I automate the AIOps stack setup for multiple environments?

You can turn this stack into Ansible roles to automate deployments, Helm charts to run it on Kubernetes, and Terraform modules to manage infrastructure automatically with code.

Conclusion

Setting up an AIOps stack for server monitoring provides proactive insights, automatic anomaly detection, and clear visibility across your entire infrastructure. By combining Prometheus, Grafana, Loki, Alertmanager, and a lightweight Python anomaly detector, you can monitor system health and performance in real time and receive smart alerts.

We hope you enjoy this guide. Subscribe to our X and Facebook channels to get the latest server monitoring guides and tips.

For optimal performance and reliability of your AIOps monitoring stack, it is recommended to use Reliable Dedicated Servers or Flexible VPS hosting.

For further reading:

Learn Server Memory Disaggregation

Deploy PyTorch Model on a VPS Environment

File Integrity Monitoring for Secure Data Integrity

Setting Up AIOps Tools for Server Monitoring

Setting Up AIOps Tools for Server Monitoring

Requirements for AIOps Stack for Server Monitoring – AIOps Tools

Install Docker and Docker Compose For Setting up an AIOps Stack

Create AIOps Stack Directory

Prometheus Configuration for AIOps Stack

Alertmanager Configuration for AIOps Stack

Loki and Promtail Log Configuration for AIOps Stack

Configure Blackbox Exporter for AIOps Stack

Grafana Configuration for AIOps Stack

AIOps Microservice: Anomaly Detector Configuration

Set up AIOps Stack Docker Compose File

Install Node Exporter on Each Server To Monitor

Verifying AIOps Stack and Creating Dashboards

Test Alerts and Validate AIOps Anomalies

FAQs

What is AIOps, and why is it important?

How does the AIOps anomaly detector work?

How can I automate the AIOps stack setup for multiple environments?

Conclusion

Post Your Comment

Navigation

Useful Links

Contact us

Setting Up AIOps Tools for Server Monitoring

Setting Up AIOps Tools for Server Monitoring

Requirements for AIOps Stack for Server Monitoring – AIOps Tools

Install Docker and Docker Compose For Setting up an AIOps Stack

Create AIOps Stack Directory

Prometheus Configuration for AIOps Stack

Alertmanager Configuration for AIOps Stack

Loki and Promtail Log Configuration for AIOps Stack

Configure Blackbox Exporter for AIOps Stack

Grafana Configuration for AIOps Stack

AIOps Microservice: Anomaly Detector Configuration

Set up AIOps Stack Docker Compose File

Install Node Exporter on Each Server To Monitor

Verifying AIOps Stack and Creating Dashboards

Test Alerts and Validate AIOps Anomalies

FAQs

What is AIOps, and why is it important?

How does the AIOps anomaly detector work?

How can I automate the AIOps stack setup for multiple environments?

Conclusion

Tags :

Post Your Comment

Navigation

Useful Links

Contact us