DAON Network Monitoring Guide
π Overview
This guide covers the comprehensive monitoring stack for DAON Network, including blockchain validators and API infrastructure.
π Monitoring Stack
Core Components
- Prometheus: Metrics collection and storage
- Grafana: Visualization and dashboards
- Alertmanager: Alert routing and notifications
- Node Exporter: System metrics collection
Architecture
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Validators β β API Server β β Node Exp. β
β β β β β β
β :26660 β β :3000 β β :9100 β
ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ
β β β
βββββββββββββββββββΌββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Prometheus β
β :9093 β
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β Grafana β
β :3000 β
βββββββββββββββββββ
π Quick Start
1. Start Monitoring Stack
cd daon-core/ccc-core
docker-compose up -d prometheus grafana alertmanager node-exporter
2. Access Dashboards
Local Development:
- Grafana: http://localhost:3000 (admin/ccc-admin)
- Prometheus: http://localhost:9093
- Alertmanager: http://localhost:9093
Production (with Caddy):
- Grafana: https://grafana.yourdomain.com
- Prometheus: https://prometheus.yourdomain.com
- Alertmanager: https://alerts.yourdomain.com
- All-in-one: https://monitoring.yourdomain.com
3. Import Dashboards
Dashboards are auto-imported from grafana/dashboards/:
- DAON API Overview: API performance metrics
- DAON Blockchain Overview: Validator and chain metrics
π Key Metrics
Blockchain Metrics
| Metric | Description | Alert Threshold |
|βββ|ββββ-|ββββββ|
| cometbft_consensus_height | Current block height | Stale > 2min |
| cometbft_p2p_peers | Connected peers | < 2 peers |
| up{job="daon-validators"} | Validator status | Down > 1min |
| cometbft_consensus_block_interval_seconds | Block time | > 30sec |
API Metrics
| Metric | Description | Alert Threshold |
|βββ|ββββ-|ββββββ|
| daon_api_http_requests_total | Total HTTP requests | - |
| daon_api_http_request_duration_seconds | Response time | 95th > 1sec |
| daon_api_content_protections_total | Content protections | - |
| daon_api_active_connections | Active connections | > 100 |
System Metrics
| Metric | Description | Alert Threshold |
|βββ|ββββ-|ββββββ|
| node_memory_MemAvailable_bytes | Available memory | < 20% |
| node_filesystem_avail_bytes | Disk space | < 20% |
| node_cpu_seconds_total | CPU usage | > 90% for 5min |
π¨ Alert Rules
Critical Alerts
- ValidatorDown: Validator unreachable > 1min
- BlockHeightStale: No new blocks > 2min
- APIServerDown: API unreachable > 30sec
- DiskSpaceCritical: < 10% disk remaining
Warning Alerts
- HighErrorRate: API errors > 5%
- HighResponseTime: 95th percentile > 1sec
- LowPeerCount: < 2 connected peers
- HighMemoryUsage: > 90% memory for 5min
π οΈ Configuration
Prometheus Targets
Edit prometheus.yml to add/modify targets:
scrape_configs:
- job_name: 'daon-validators'
static_configs:
- targets: ['validator1:26660', 'validator2:26660']
- job_name: 'daon-api'
static_configs:
- targets: ['api-server:3000']
Alert Notifications
Configure alertmanager.yml for your notification channels:
Slack Setup
receivers:
- name: 'slack-alerts'
slack_configs:
- api_url: 'YOUR_WEBHOOK_URL'
channel: '#daon-alerts'
title: 'DAON Alert: '
Email Setup
global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@daon.network'
smtp_auth_username: 'your-email@gmail.com'
smtp_auth_password: 'your-app-password'
receivers:
- name: 'email-alerts'
email_configs:
- to: 'devops@daon.network'
subject: 'DAON Alert: '
π Dashboard Guide
DAON API Overview Dashboard
- Request Rate: Requests per second across all endpoints
- Response Time: 95th percentile response times
- Error Rate: Percentage of 5xx responses
- Active Connections: Current WebSocket/HTTP connections
- Content Protections: Rate of content protection requests
DAON Blockchain Overview Dashboard
- Block Height: Current chain height across validators
- Validator Status: Up/Down status for each validator
- Peer Count: Number of connected peers per validator
- Transaction Rate: Transactions per second
- Resource Usage: Memory and CPU utilization
π§ Troubleshooting
Metrics Not Appearing
- Check target health in Prometheus:
curl http://localhost:9093/targets - Verify metrics endpoints:
# API Server metrics curl http://localhost:3001/metrics # Validator metrics (if exposed) curl http://validator1:26660/metrics - Check container logs:
docker-compose logs prometheus docker-compose logs grafana
Alerts Not Firing
- Verify alert rules:
# Check rules are loaded curl http://localhost:9093/api/v1/rules -
Test alert expressions in Prometheus query browser
- Check Alertmanager status:
curl http://localhost:9093/api/v1/status
Dashboard Issues
- Verify Grafana datasource connection to Prometheus
- Check dashboard JSON for query syntax errors
- Confirm time ranges match your data retention
π Public Access Setup
Caddy Reverse Proxy Configuration
For secure public access to monitoring dashboards:
# 1. Configure Caddy for monitoring access
cd daon-core/ccc-core
./setup-monitoring-caddy.sh
# 2. Enable Caddy in docker-compose.yml
# Uncomment the caddy service section
# 3. Start with Caddy
docker-compose up -d caddy
Security Features
- HTTP Basic Authentication on all monitoring endpoints
- HTTPS with automatic Letβs Encrypt certificates
- Security headers to prevent common attacks
- Optional IP restrictions for additional security
Access URLs
Once configured, access your monitoring stack at:
https://grafana.yourdomain.com- Main dashboardshttps://prometheus.yourdomain.com- Raw metricshttps://alerts.yourdomain.com- Alert managerhttps://monitoring.yourdomain.com- All-in-one (path-based)
π¦ Production Deployment
Resource Requirements
- Prometheus: 2GB RAM, 100GB storage
- Grafana: 1GB RAM, 10GB storage
- Alertmanager: 512MB RAM, 1GB storage
Data Retention
Configure retention in prometheus.yml:
global:
scrape_interval: 15s
# Storage retention
command:
- '--storage.tsdb.retention.time=30d'
- '--storage.tsdb.retention.size=50GB'
High Availability
For production, consider:
- Multiple Prometheus instances with federation
- Grafana clustering with shared database
- Alertmanager clustering for redundancy
Security
- Enable authentication on all components
- Use TLS/SSL for external access
- Configure firewalls to restrict metric endpoints
- Regular backup of Grafana dashboards and Prometheus data
π Links
- Prometheus Docs: https://prometheus.io/docs/
- Grafana Docs: https://grafana.com/docs/
- Cosmos SDK Metrics: https://docs.cosmos.network/main/core/telemetry
- Node Exporter: https://github.com/prometheus/node_exporter
Need help? Check our troubleshooting guide or file an issue.