Add Advanced Configuration

Matthias Klein 2025-08-03 20:33:08 +00:00
parent daeab95422
commit 7fe35412f7

475
Advanced-Configuration.md Normal file

@ -0,0 +1,475 @@
# Advanced Configuration
Comprehensive guide to fine-tuning **GTS-HolMirDas** for production environments and specific use cases.
## Environment Variables Reference
### Core Configuration
```bash
# GoToSocial Connection (Required)
GTS_SERVER_URL=https://your-gts-instance.tld
GTS_ACCESS_TOKEN=your_access_token_here
# Processing Control
MAX_POSTS_PER_RUN=25 # Posts per feed per run (1-100)
DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls (0.5-5)
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
# File Configuration
RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file
# Optional Features
HEALTHCHECK_URL= # Healthchecks.io ping URL
USER_AGENT=GTS-HolMirDas/1.1.0 # Custom User-Agent string
```
### Advanced Processing Options
```bash
# Memory Management
DUPLICATE_CACHE_SIZE=10000 # Max URLs to cache (affects memory)
BATCH_SIZE=50 # Posts processed per batch
# Network Configuration
REQUEST_TIMEOUT=30 # Seconds to wait for RSS/API responses
MAX_RETRIES=3 # Retry attempts for failed requests
BACKOFF_FACTOR=2 # Exponential backoff multiplier
# Federation Control
INSTANCE_DISCOVERY=true # Enable automatic instance discovery
MIN_INSTANCE_POSTS=5 # Minimum posts before counting instance
```
### Production Hardening
```bash
# Security
VALIDATE_SSL=true # Enforce SSL certificate validation
ALLOWED_DOMAINS= # Comma-separated list of allowed RSS domains
# Resource Limits
MAX_FEED_SIZE=10MB # Maximum RSS feed size to process
MAX_PROCESSING_TIME=1800 # Kill run after 30 minutes
MEMORY_LIMIT=512MB # Container memory limit (Docker)
# Logging
LOG_FORMAT=json # json, text
LOG_TO_FILE=true # Enable file logging
LOG_RETENTION_DAYS=30 # Days to keep log files
```
## RSS Feed Strategies
### Feed Selection Methodology
#### High-Quality Instance Selection
**Tech-Focused Instances (Recommended):**
```bash
# Excellent signal-to-noise ratio
https://fosstodon.org/tags/homelab.rss?limit=100
https://infosec.exchange/tags/security.rss?limit=100
https://social.tchncs.de/tags/linux.rss?limit=75
# Specialized communities
https://chaos.social/tags/ccc.rss?limit=50
https://mas.to/tags/privacy.rss?limit=50
```
**Balanced General Instances:**
```bash
# Large instances with moderate limits
https://mastodon.social/tags/technology.rss?limit=50
https://mstdn.social/tags/programming.rss?limit=40
https://hachyderm.io/tags/devops.rss?limit=60
```
#### Hashtag Strategy
**Tier 1: Core Topics (Use high limits)**
```bash
# Your primary interests - use limit=75-100
https://fosstodon.org/tags/homelab.rss?limit=100
https://fosstodon.org/tags/selfhosting.rss?limit=100
https://fosstodon.org/tags/docker.rss?limit=100
```
**Tier 2: Secondary Topics (Moderate limits)**
```bash
# Related interests - use limit=50-75
https://mastodon.social/tags/linux.rss?limit=50
https://social.tchncs.de/tags/privacy.rss?limit=50
https://infosec.exchange/tags/cybersecurity.rss?limit=60
```
**Tier 3: Discovery Topics (Conservative limits)**
```bash
# Exploration areas - use limit=25-40
https://mastodon.social/tags/photography.rss?limit=30
https://pixelfed.social/tags/art.rss?limit=25
```
### Feed Quality Assessment
#### Monitoring Feed Performance
```bash
# Check feed response times
curl -w "@curl-format.txt" -s -o /dev/null https://fosstodon.org/tags/homelab.rss
# curl-format.txt content:
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_redirect: %{time_redirect}\n
# time_starttransfer: %{time_starttransfer}\n
# ----------\n
# time_total: %{time_total}\n
```
#### Feed Quality Metrics
**High-Quality Indicators:**
- Response time < 2 seconds
- Consistent content updates
- Low duplicate rate with other feeds
- Active community engagement
**Red Flags:**
- Frequent timeouts or errors
- Very high duplicate rate
- Spam or low-quality content
- Instance frequently down
### Geographic and Language Considerations
```bash
# English-language instances
https://fosstodon.org/tags/homelab.rss?limit=75
https://hachyderm.io/tags/devops.rss?limit=50
# German-language instances
https://social.tchncs.de/tags/homelab.rss?limit=50
https://chaos.social/tags/34c3.rss?limit=25
# Multi-language instances
https://mastodon.social/tags/technology.rss?limit=40
```
## Production Deployment
### Docker Compose Production Configuration
```yaml
version: '3.8'
services:
gts-holmirdas:
image: gts-holmirdas:latest
container_name: gts-holmirdas-prod
restart: unless-stopped
# Resource limits
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
# Security
user: "1000:1000"
read_only: true
security_opt:
- no-new-privileges:true
# Networking
networks:
- gts-network
# Environment
env_file:
- .env.production
# Volumes
volumes:
- ./data:/app/data:rw
- ./rss_feeds.txt:/app/rss_feeds.txt:ro
- ./logs:/app/logs:rw
- /tmp:/tmp:rw # Required for read_only mode
# Health check
healthcheck:
test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8080/health')"]
interval: 30m
timeout: 10s
retries: 3
start_period: 5m
# Logging
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
compress: "true"
networks:
gts-network:
external: true
```
### Production Environment Variables
```bash
# .env.production
GTS_SERVER_URL=https://social.yourdomain.com
GTS_ACCESS_TOKEN=prod_token_here
# Production tuning
MAX_POSTS_PER_RUN=50
DELAY_BETWEEN_REQUESTS=1
LOG_LEVEL=INFO
LOG_FORMAT=json
LOG_TO_FILE=true
# Security hardening
VALIDATE_SSL=true
REQUEST_TIMEOUT=30
MAX_RETRIES=2
MEMORY_LIMIT=512MB
# Monitoring
HEALTHCHECK_URL=https://hc-ping.com/your-production-uuid
```
### Monitoring and Alerting
#### Log Analysis Setup
```bash
# Structured logging for analysis
LOG_FORMAT=json
# Example log analysis with jq
docker logs gts-holmirdas 2>&1 | jq -r 'select(.level=="ERROR") | .message'
# Performance monitoring
docker logs gts-holmirdas 2>&1 | jq -r 'select(.posts_processed) | "\(.timestamp): \(.posts_processed) posts in \(.runtime)"'
```
#### Metrics Collection
```bash
# Custom metrics script
#!/bin/bash
# metrics-collector.sh
STATS=$(docker logs gts-holmirdas --tail=1 2>&1 | grep "Run Statistics" -A 6)
RUNTIME=$(echo "$STATS" | grep "Runtime" | cut -d':' -f2- | tr -d ' ')
POSTS=$(echo "$STATS" | grep "Total posts" | cut -d':' -f2 | tr -d ' ')
INSTANCES=$(echo "$STATS" | grep "Current known" | cut -d':' -f2 | tr -d ' ')
# Send to monitoring system
curl -X POST "https://monitoring.yourdomain.com/metrics" \
-H "Content-Type: application/json" \
-d "{\"runtime\":\"$RUNTIME\",\"posts\":$POSTS,\"instances\":$INSTANCES}"
```
### Backup and Recovery
#### Automated Backup Script
```bash
#!/bin/bash
# backup-gts-holmirdas.sh
BACKUP_DIR="/backups/gts-holmirdas"
DATE=$(date +%Y%m%d_%H%M%S)
CONTAINER="gts-holmirdas"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Stop container gracefully
docker-compose stop gts-holmirdas
# Backup data directory
tar -czf "$BACKUP_DIR/data_$DATE.tar.gz" ./data
# Backup configuration
cp .env "$BACKUP_DIR/env_$DATE"
cp rss_feeds.txt "$BACKUP_DIR/rss_feeds_$DATE.txt"
cp docker-compose.yml "$BACKUP_DIR/compose_$DATE.yml"
# Restart container
docker-compose start gts-holmirdas
# Cleanup old backups (keep 30 days)
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +30 -delete
find "$BACKUP_DIR" -name "env_*" -mtime +30 -delete
find "$BACKUP_DIR" -name "rss_feeds_*" -mtime +30 -delete
echo "Backup completed: $DATE"
```
#### Recovery Procedure
```bash
# Full recovery from backup
#!/bin/bash
# restore-gts-holmirdas.sh
BACKUP_DATE=$1 # e.g., 20240115_143022
if [ -z "$BACKUP_DATE" ]; then
echo "Usage: $0 <backup_date>"
exit 1
fi
# Stop current container
docker-compose down
# Restore data
tar -xzf "/backups/gts-holmirdas/data_$BACKUP_DATE.tar.gz"
# Restore configuration
cp "/backups/gts-holmirdas/env_$BACKUP_DATE" .env
cp "/backups/gts-holmirdas/rss_feeds_$BACKUP_DATE.txt" rss_feeds.txt
cp "/backups/gts-holmirdas/compose_$BACKUP_DATE.yml" docker-compose.yml
# Restart with restored configuration
docker-compose up -d
echo "Recovery completed from backup: $BACKUP_DATE"
```
## Multi-Instance Deployment
### Load Balancing Multiple Instances
For very large deployments, you can run multiple GTS-HolMirDas instances:
```yaml
# docker-compose-multi.yml
version: '3.8'
services:
gts-holmirdas-1:
image: gts-holmirdas:latest
env_file: .env.1
volumes:
- ./data1:/app/data
- ./feeds1.txt:/app/rss_feeds.txt:ro
gts-holmirdas-2:
image: gts-holmirdas:latest
env_file: .env.2
volumes:
- ./data2:/app/data
- ./feeds2.txt:/app/rss_feeds.txt:ro
```
### Feed Distribution Strategy
```bash
# feeds1.txt - Tech focus
https://fosstodon.org/tags/homelab.rss?limit=100
https://fosstodon.org/tags/docker.rss?limit=100
https://infosec.exchange/tags/security.rss?limit=100
# feeds2.txt - General topics
https://mastodon.social/tags/technology.rss?limit=50
https://hachyderm.io/tags/programming.rss?limit=50
https://social.tchncs.de/tags/linux.rss?limit=50
```
## Integration with External Systems
### Webhook Integration
```bash
# Add to .env
WEBHOOK_URL=https://your-system.com/webhook/gts-holmirdas
WEBHOOK_SECRET=your_webhook_secret
# Webhook payload example:
{
"timestamp": "2024-01-15T14:30:22Z",
"runtime": "0:04:23",
"posts_processed": 87,
"instances_discovered": 12,
"total_instances": 2847,
"feeds_processed": 45,
"success": true
}
```
### Prometheus Metrics
```bash
# Custom metrics exporter
#!/bin/bash
# prometheus-metrics.sh
# Parse latest statistics
STATS=$(docker logs gts-holmirdas --tail=1 2>&1 | grep "Run Statistics" -A 6)
# Extract metrics
POSTS=$(echo "$STATS" | grep "Total posts" | grep -o '[0-9]\+')
INSTANCES=$(echo "$STATS" | grep "Current known" | grep -o '[0-9]\+')
RUNTIME_MIN=$(echo "$STATS" | grep "Runtime" | grep -o '[0-9]\+:[0-9]\+' | cut -d':' -f2)
# Export to Prometheus format
cat > /tmp/gts-holmirdas-metrics.prom << EOF
# HELP gts_holmirdas_posts_processed_total Total posts processed in last run
# TYPE gts_holmirdas_posts_processed_total counter
gts_holmirdas_posts_processed_total $POSTS
# HELP gts_holmirdas_instances_known Total known fediverse instances
# TYPE gts_holmirdas_instances_known gauge
gts_holmirdas_instances_known $INSTANCES
# HELP gts_holmirdas_runtime_minutes Runtime of last processing run in minutes
# TYPE gts_holmirdas_runtime_minutes gauge
gts_holmirdas_runtime_minutes $RUNTIME_MIN
EOF
```
## Advanced Troubleshooting
### Performance Profiling
```bash
# Enable detailed profiling
LOG_LEVEL=DEBUG
PROFILE_ENABLED=true
PROFILE_OUTPUT_DIR=/app/profiles
# Analyze performance bottlenecks
docker-compose exec gts-holmirdas python3 -m cProfile -o /app/profiles/profile.out /app/gts_holmirdas.py
```
### Custom User Agent Configuration
```bash
# Avoid rate limiting by customizing User-Agent
USER_AGENT="GTS-HolMirDas/1.1.0 (+https://git.klein.ruhr/matthias/gts-holmirdas)"
```
### Network Optimization
```bash
# DNS caching for better performance
DNS_CACHE_TTL=300
# Connection pooling
CONNECTION_POOL_SIZE=10
CONNECTION_POOL_MAXSIZE=20
```
This advanced configuration guide should help optimize GTS-HolMirDas for any production environment or specific use case!