7 Ways: Build LLM-Powered Log Triage with Python & DeepSeek-R1

The Gap Between Metrics and Meaning

Prometheus and Grafana do an excellent job of tracking system health in a homelab. They show CPU usage, memory consumption, and container uptime in neat dashboards. Alertmanager sends Discord notifications when a node goes offline or when a metric crosses a threshold. But there is a gap that sits right between “the container is running” and “the container is healthy.”

llm log triage

Metrics tell you that something is wrong. They do not tell you why. A container can show stable CPU, normal memory, and passing health checks while silently logging critical errors. Maybe it is failing to reach an upstream API. Maybe it is retrying a database connection every thirty seconds. Maybe a deprecation warning lurks in the logs that will break the next release. None of this appears in Prometheus metrics. All of it lives in the logs.

Reading Docker logs across multiple containers several times a day feels productive for about ten minutes. After that, skimming sets in, and important entries get missed. That is where llm log triage comes into play. A Python script runs every fifteen minutes, pulls Docker container logs, classifies severity using rules, and sends critical entries to a small language model. The model writes a plain-English summary. That summary lands in a Discord channel. Instead of reading hundreds of log lines hoping to spot the important one, an LLM reads them and only bothers you when something actually matters.

Why Alertmanager Is Not Enough

Alertmanager handles the metrics side well. If CPU spikes above ninety percent for five minutes, or if a node becomes unreachable, it fires an alert. But metrics and logs are different things. A container can be running fine from a metrics perspective while logging errors internally. The log triage pipeline covers the gap between “the container is running” and “the container is healthy.”

This is not a fancy AI agent with tool use and multi-step reasoning. It is a straightforward automation — rules-based triage plus an LLM for summarization. The design is intentional. The LLM runs on a separate machine because inference competes with the services it monitors. The traffic between the script and the model travels over an encrypted mesh VPN, never touching the public internet. The model endpoint is not exposed to anyone outside the network.

Architecture Overview: Four Components Across Two Machines

The pipeline has four components spread across two machines. On the local server sits the Python script that reads Docker logs and classifies severity, a cron job that runs the script every fifteen minutes, and Docker itself, whose containers produce the logs. On the Oracle Cloud instance in Phoenix, Arizona runs Ollama, serving the DeepSeek-R1 1.5B model as a REST API. Tailscale connects both machines over an encrypted mesh VPN. Discord webhooks receive the final alert messages.

Why the LLM Runs on a Separate Machine

The separation is deliberate. The Oracle instance has 24GB of RAM — enough to load a small model comfortably. The local server has less headroom, and running model inference alongside the Docker services it monitors would degrade performance. The Python script calls the Ollama API over Tailscale, so traffic never touches the public internet. The model endpoint remains invisible to anyone outside the Tailscale network.

Network Security Without Complexity

Tailscale creates a mesh VPN using WireGuard encryption. Both machines appear on the same virtual network even though they sit in different data centers. The script reaches the Ollama API at a Tailscale IP address, not a public one. No firewall rules need to open ports to the internet. No reverse proxy is required. The model simply stays private.

Setting Up Ollama and DeepSeek-R1

Ollama makes self-hosting a language model surprisingly painless. On the Oracle instance, the setup requires two commands. First, install Ollama using the curl script. Then pull the model.

curl -fsSL https://ollama.com/install.sh | sh
ollama pull deepseek-r1:1.5b

That is it. Ollama downloads the model and serves it as a REST API on port 11434. You can test it immediately with a curl command.

curl http://localhost:11434/api/generate -d '{
 "model": "deepseek-r1:1.5b",
 "prompt": "Summarize this log entry: ERROR: database connection refused at 10.0.0.5:5432, retrying in 30s",
 "stream": false
}'

The model responds with a natural-language summary of what the log entry means. For summarizing log lines, you do not need GPT-4 level intelligence. The 1.5B parameter model reliably interprets error messages and produces clear explanations.

Why the 1.5B Parameter Model

The 1.5B parameter model was chosen deliberately. It is small enough to run on the Oracle ARM instance without maxing out memory. Inference takes only a few seconds per log entry. Larger models would be slower and not worth the tradeoff. For summarizing log lines, fast and good enough beats slow and perfect.

DeepSeek-R1 1.5B is a distilled reasoning model. It was trained to think through problems step by step before producing an answer. For log triage, that reasoning capability helps the model distinguish between a transient warning and a genuine failure. It does not just parrot keywords. It understands context.

The Two-Stage Triage Pipeline

The script uses a two-stage approach. Stage one performs rules-based severity classification. Stage two sends only critical entries to the LLM for summarization. This design keeps costs low and response times fast.

Stage One: Rules-Based Classification

The script reads the last fifteen minutes of logs using the Docker CLI command docker logs --since 15m. It checks each line against keyword patterns. Lines containing words like “ERROR”, “FATAL”, “CRITICAL”, “panic”, “refused”, “timeout”, or “failed” get flagged. Lines with “WARN” or “INFO” are ignored unless they match a secondary pattern.

This stage filters out the noise. A typical homelab container produces hundreds of log lines every fifteen minutes. Most are routine. The script discards everything that does not match the critical pattern. Only the flagged lines proceed to stage two.

Stage Two: LLM Summarization

The flagged log entries are sent to the DeepSeek-R1 model via the Ollama API. The prompt asks the model to write a plain-English summary of what the log entry means and whether it requires immediate attention. The model reads the raw log line and produces a clear explanation.

For example, a raw log entry like ERROR: database connection refused at 10.0.0.5:5432, retrying in 30s becomes something like: “The application cannot connect to the database at IP address 10.0.0.5 on port 5432. The connection was refused, which could mean the database service is down or the port is blocked. The application will retry in thirty seconds. If this persists, check whether the database container is running and accepting connections.”

That summary gets posted to a Discord channel via a webhook. The channel receives only the summaries, not the raw log lines. The result is a clean, readable alert that tells you what happened and what to check next.

Why Rules Alone Are Not Enough

Rules-based triage works well for known patterns. If you already know that “connection refused” is critical, you can write a rule for it. But logs are messy. Errors appear in different formats across different containers. A rule that catches “ERROR” in one container might miss “FATAL” in another. Worse, some containers log errors that are actually harmless, like a retry loop that resolves itself.

The LLM adds a layer of semantic understanding. It can tell the difference between a transient error and a persistent failure. It can summarize a multi-line stack trace into a single sentence. It can flag patterns that do not match any predefined rule. The combination of rules and an LLM catches more issues than either approach alone.

Implementing the Python Script

The script itself is straightforward. It runs as a cron job every fifteen minutes. Here is how the core logic works.

Reading Docker Logs

The script uses the docker Python SDK or calls the Docker CLI directly. It iterates over running containers and reads the last fifteen minutes of logs for each one. The --since 15m flag limits the output to recent entries, keeping the script efficient.

import subprocess
import json
import requests
import os

CONTAINERS = ["nginx", "postgres", "redis", "api-gateway"]
CRITICAL_KEYWORDS = ["error", "fatal", "critical", "panic", "refused", "timeout", "failed"]
OLLAMA_URL = "http://100.x.x.x:11434/api/generate" # Tailscale IP
DISCORD_WEBHOOK = os.getenv("DISCORD_WEBHOOK_URL")

def get_logs(container_name, since="15m"):
 cmd = ["docker", "logs", "--since", since, container_name]
 result = subprocess.run(cmd, capture_output=True, text=True)
 return result.stdout.splitlines()

Classifying Severity

Each log line is checked against the keyword list. Lines that match are collected into a batch. The script limits the batch to prevent sending too many lines to the LLM at once. If no critical lines are found, the script exits silently.

You may also enjoy reading: 7 Best Newegg Promo Codes to Save 10% Off in May 2026.

def classify_logs(lines):
 critical_lines = []
 for line in lines:
 lower = line.lower()
 if any(kw in lower for kw in CRITICAL_KEYWORDS):
 critical_lines.append(line)
 return critical_lines[:10] # Limit to 10 lines per run

Calling the LLM

The critical lines are sent to the Ollama API as a prompt. The model returns a plain-English summary. The script then posts that summary to the Discord webhook.

def summarize_logs(log_lines):
 prompt = "Summarize these log entries in plain English. Explain what went wrong and whether it needs immediate attention:\n\n"
 prompt += "\n".join(log_lines)

 payload = {
 "model": "deepseek-r1:1.5b",
 "prompt": prompt,
 "stream": False
 }

 response = requests.post(OLLAMA_URL, json=payload)
 return response.json()["response"]

def send_discord_alert(summary):
 payload = {"content": summary}
 requests.post(DISCORD_WEBHOOK, json=payload)

The Cron Job

The script is registered as a cron job that runs every fifteen minutes. The crontab entry looks like this:

*/15 * * * * /usr/bin/python3 /home/user/log_triage.py

That is it. The script runs, checks logs, classifies severity, calls the LLM, and posts summaries to Discord. No human needs to read raw log files.

Real-World Performance and Observations

After running this pipeline for several weeks, a few patterns emerged. The DeepSeek-R1 1.5B model handles about ninety-five percent of log summaries correctly. It occasionally misinterprets a debug message as critical, but those false positives are rare and easy to spot. The model reliably catches issues that rules alone would miss, such as a container that logs “connection reset by peer” without using the word “error.”

Inference time averages about three seconds per batch of ten log lines. The Oracle ARM instance handles the load without breaking a sweat. Memory usage stays below 4GB even during peak inference. The script itself consumes negligible resources on the local server.

The biggest surprise was how often the LLM caught issues that Prometheus metrics showed as healthy. A container running a Python web application logged repeated SSL handshake failures for two hours before the metrics showed any degradation. Prometheus saw normal CPU and memory. The LLM saw the error pattern and alerted. By the time the metrics turned red, the issue was already resolved.

Why This Matters for Homelab Operators

Homelabs are not enterprise data centers. They run on consumer hardware with limited resources. Running a full observability stack with centralized logging and SIEM tools is overkill. But ignoring logs entirely leads to preventable outages.

LLM log triage fills that middle ground. It requires no expensive hardware, no cloud credits, and no complex infrastructure. A single Oracle ARM instance with 24GB of RAM costs about fifteen dollars per month. The Python script is under a hundred lines. The setup takes an afternoon.

The result is a system that reads your logs for you, summarizes them in plain English, and only bothers you when something actually matters. It is not perfect. It will miss some things. But it catches far more than reading logs manually ever did.

Potential Pitfalls and How to Avoid Them

No system is without challenges. Here are a few issues that came up during implementation and how to handle them.

Model Hallucinations

DeepSeek-R1 1.5B occasionally invents details that are not in the log entry. For example, it might say “the database server is running version 14” when the log entry contains no version information. The fix is to include a system prompt that instructs the model to only summarize what is in the log entry and not add extra information. The prompt “Summarize only what is present in the log lines. Do not add information that is not there” reduces hallucinations significantly.

Rate Limiting and Duplicate Alerts

The cron job runs every fifteen minutes. If the same error persists across multiple runs, the Discord channel receives the same summary multiple times. A deduplication check prevents this. The script stores the hash of the last alert and skips posting if the new summary matches the previous one.

import hashlib

last_alert_hash = None

def should_alert(summary):
 global last_alert_hash
 current_hash = hashlib.md5(summary.encode()).hexdigest()
 if current_hash == last_alert_hash:
 return False
 last_alert_hash = current_hash
 return True

Container Names Changing

Docker containers can restart with different names if using docker-compose with random suffixes. The script uses container labels instead of names to identify services consistently. Adding a label like triage=true to the docker-compose file ensures the script always finds the right containers.

services:
 nginx:
 image: nginx:latest
 labels:
 triage: "true"

Extending the Pipeline

The basic pipeline works well for a homelab, but it can be extended in several ways. Adding a second LLM for different log types improves accuracy. For example, one model handles application logs while another handles system logs. The script routes log lines to the appropriate model based on the container name.

Another extension is to add a feedback loop. If a human marks a summary as incorrect or irrelevant, that feedback can be used to refine the prompt or adjust the keyword list. A simple database stores the feedback and the script adjusts its behavior over time.

The pipeline can also send alerts to multiple channels. Critical errors go to Discord with a ping. Warnings go to a separate channel without a ping. Informational summaries go to a log file only. The severity classification determines the destination.

Add Comment