212 lines
7 KiB
Markdown
212 lines
7 KiB
Markdown
# guenther
|
||
|
||
A streaming anomaly detection pipeline for Managed-File-Transfer (MFT) infrastructure.
|
||
guenther ingests system metrics and application logs in real time, extracts structured
|
||
feature vectors per time window, and scores them with an ensemble of unsupervised
|
||
detectors — without any labelled training data.
|
||
|
||
---
|
||
|
||
## How it works
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ Ingestion │
|
||
│ MetricCollector (/proc) LogCollector (inotify + Drain3) │
|
||
│ SystemctlCollector (service states) │
|
||
└────────────────────┬────────────────────────────────────────┘
|
||
│ channels (backpressure)
|
||
┌────────────────────▼────────────────────────────────────────┐
|
||
│ Transformation │
|
||
│ TransformEngine – 30 s tumbling windows via DuckDB │
|
||
│ 45 base features + N Drain3 parameter aggregates │
|
||
└────────────────────┬────────────────────────────────────────┘
|
||
│
|
||
┌────────────────────▼────────────────────────────────────────┐
|
||
│ Detection │
|
||
│ EnsembleDetector (RRCF fast/mid/slow · COPOD · MAD) │
|
||
│ SEAD online weight adaptation · auto-scaling (3 stages) │
|
||
└────────────────────┬────────────────────────────────────────┘
|
||
│
|
||
anomalies.jsonl
|
||
```
|
||
|
||
### Packages
|
||
|
||
| Path | Responsibility |
|
||
| -------------------- | -------------------------------------------------------------------------------- |
|
||
| `cmd/pipeline` | Entry point, wiring, graceful shutdown |
|
||
| `internal/collector` | `MetricCollector` (`/proc`), `LogCollector` (inotify), `SystemctlCollector` |
|
||
| `internal/transform` | `TransformEngine` — DuckDB windowed aggregation |
|
||
| `internal/detect` | `EnsembleDetector`, RRCF, COPOD, MAD, IsolationForest, SEAD, `ScalingController` |
|
||
| `internal/drain3` | Masking / parameter extraction wrapper around Drain3 |
|
||
| `internal/config` | YAML config loading and regex compilation |
|
||
| `internal/health` | `HealthMonitor` — per-stage counters |
|
||
| `pkg/types` | Shared types: `LogEvent`, `MetricSnapshot`, `FeatureVector`, `AnomalyResult` |
|
||
|
||
---
|
||
|
||
## Requirements
|
||
|
||
| Dependency | Notes |
|
||
| --------------- | ------------------------------------------------------------ |
|
||
| Docker | Required for the containerised build (recommended) |
|
||
| Go ≥ 1.25 | Only needed for local builds |
|
||
| gcc / libc6-dev | CGO is required by `go-duckdb` |
|
||
| Linux | Metric collection reads `/proc`; not supported on other OSes |
|
||
|
||
---
|
||
|
||
## Building
|
||
|
||
### Docker (recommended — no local toolchain needed)
|
||
|
||
```bash
|
||
make build
|
||
```
|
||
|
||
The binary is written to `build/guenther`.
|
||
|
||
### Local (requires Go + gcc)
|
||
|
||
```bash
|
||
make build-local
|
||
```
|
||
|
||
---
|
||
|
||
## Running
|
||
|
||
```bash
|
||
./build/guenther -config configs/default.yaml
|
||
```
|
||
|
||
guenther shuts down cleanly on `SIGINT` or `SIGTERM`.
|
||
|
||
---
|
||
|
||
## Testing
|
||
|
||
```bash
|
||
make test
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
guenther is configured via a single YAML file (default: `configs/default.yaml`).
|
||
|
||
```yaml
|
||
ingestion:
|
||
log_path: "/path/to/log/file/transfer.log" # file to tail
|
||
net_interface: "ens4" # interface for /proc/net/dev
|
||
disk_device: "vda1" # device for /proc/diskstats
|
||
systemctl_services:
|
||
- service1.service
|
||
- service2.service
|
||
|
||
transformation:
|
||
window_size: "30s" # tumbling window length
|
||
db_path: "data/pipeline.duckdb" # DuckDB file (use :memory: for ephemeral)
|
||
|
||
drain:
|
||
depth: 4
|
||
sim_threshold: 0.4
|
||
max_children: 100
|
||
max_clusters: 1000
|
||
masking_patterns: # applied in order before template mining
|
||
- name: "uuid"
|
||
pattern: '\b[0-9a-fA-F]{8}-...\b'
|
||
replace: "<UUID>"
|
||
type: "string"
|
||
# ... see configs/default.yaml for the full set
|
||
|
||
detector:
|
||
method: "ensemble" # fallback when ensemble.enabled = false
|
||
ensemble:
|
||
enabled: true
|
||
method: "sead" # avg | max | median | sead
|
||
contamination: 0.15
|
||
sead:
|
||
eta: 0.1
|
||
lambda: 0.01
|
||
auto_scaling:
|
||
enabled: true
|
||
high_threshold: 75.0 # CPU % → switch to mid detector
|
||
critical_threshold: 90.0 # CPU % → switch to fast detector
|
||
down_threshold: 50.0
|
||
high_duration: 90.0 # seconds load must persist before scaling
|
||
critical_duration: 120.0
|
||
down_duration: 120.0
|
||
rrcf_variants:
|
||
fast: { num_trees: 50, tree_size: 32, threshold_percentile: 0.85 }
|
||
mid: { num_trees: 150, tree_size: 64, threshold_percentile: 0.85 }
|
||
slow: { num_trees: 200, tree_size: 128, threshold_percentile: 0.85 }
|
||
copod:
|
||
buffer_size: 50
|
||
threshold: 0.3
|
||
mad:
|
||
threshold: 3.5
|
||
calibration_size: 50
|
||
|
||
output:
|
||
feature_log_path: "logs/features.jsonl"
|
||
anomaly_log_path: "logs/anomalies.jsonl"
|
||
```
|
||
|
||
### Masking pattern types
|
||
|
||
Patterns with `type: float` extract a named parameter into `FeatureVector.ParamAvg`;
|
||
patterns with `type: string` replace the match in-place before template mining.
|
||
Named patterns (`name != ""`) are aggregated as features per window.
|
||
|
||
---
|
||
|
||
## Output
|
||
|
||
**`logs/anomalies.jsonl`** — one JSON object per scored window:
|
||
|
||
```json
|
||
{
|
||
"timestamp": "2026-01-15T14:32:00Z",
|
||
"score": 0.8721,
|
||
"is_anomaly": true,
|
||
"confidence": 0.91,
|
||
"method": "sead_ensemble",
|
||
"details": "rrcf_slow=0.91 copod=0.83 mad=0.78"
|
||
}
|
||
```
|
||
|
||
**`logs/features.jsonl`** — raw feature vectors for offline analysis (optional).
|
||
|
||
---
|
||
|
||
## Project layout
|
||
|
||
```
|
||
guenther/
|
||
├── cmd/
|
||
│ └── pipeline/
|
||
│ └── main.go
|
||
├── internal/
|
||
│ ├── collector/
|
||
│ ├── config/
|
||
│ ├── detect/
|
||
│ ├── drain3/
|
||
│ ├── health/
|
||
│ └── transform/
|
||
├── pkg/
|
||
│ └── types/
|
||
├── configs/
|
||
│ └── default.yaml
|
||
├── build/ # created by `make build`
|
||
├── Makefile
|
||
└── README.md
|
||
```
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
This project was developed as part of a Bachelor's thesis.
|