diff --git a/Advanced-Configuration.md b/Advanced-Configuration.md new file mode 100644 index 0000000..5cd179a --- /dev/null +++ b/Advanced-Configuration.md @@ -0,0 +1,475 @@ +# Advanced Configuration + +Comprehensive guide to fine-tuning **GTS-HolMirDas** for production environments and specific use cases. + +## Environment Variables Reference + +### Core Configuration + +```bash +# GoToSocial Connection (Required) +GTS_SERVER_URL=https://your-gts-instance.tld +GTS_ACCESS_TOKEN=your_access_token_here + +# Processing Control +MAX_POSTS_PER_RUN=25 # Posts per feed per run (1-100) +DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls (0.5-5) +LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR + +# File Configuration +RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file + +# Optional Features +HEALTHCHECK_URL= # Healthchecks.io ping URL +USER_AGENT=GTS-HolMirDas/1.1.0 # Custom User-Agent string +``` + +### Advanced Processing Options + +```bash +# Memory Management +DUPLICATE_CACHE_SIZE=10000 # Max URLs to cache (affects memory) +BATCH_SIZE=50 # Posts processed per batch + +# Network Configuration +REQUEST_TIMEOUT=30 # Seconds to wait for RSS/API responses +MAX_RETRIES=3 # Retry attempts for failed requests +BACKOFF_FACTOR=2 # Exponential backoff multiplier + +# Federation Control +INSTANCE_DISCOVERY=true # Enable automatic instance discovery +MIN_INSTANCE_POSTS=5 # Minimum posts before counting instance +``` + +### Production Hardening + +```bash +# Security +VALIDATE_SSL=true # Enforce SSL certificate validation +ALLOWED_DOMAINS= # Comma-separated list of allowed RSS domains + +# Resource Limits +MAX_FEED_SIZE=10MB # Maximum RSS feed size to process +MAX_PROCESSING_TIME=1800 # Kill run after 30 minutes +MEMORY_LIMIT=512MB # Container memory limit (Docker) + +# Logging +LOG_FORMAT=json # json, text +LOG_TO_FILE=true # Enable file logging +LOG_RETENTION_DAYS=30 # Days to keep log files +``` + +## RSS Feed Strategies + +### Feed Selection Methodology + +#### High-Quality Instance Selection + +**Tech-Focused Instances (Recommended):** +```bash +# Excellent signal-to-noise ratio +https://fosstodon.org/tags/homelab.rss?limit=100 +https://infosec.exchange/tags/security.rss?limit=100 +https://social.tchncs.de/tags/linux.rss?limit=75 + +# Specialized communities +https://chaos.social/tags/ccc.rss?limit=50 +https://mas.to/tags/privacy.rss?limit=50 +``` + +**Balanced General Instances:** +```bash +# Large instances with moderate limits +https://mastodon.social/tags/technology.rss?limit=50 +https://mstdn.social/tags/programming.rss?limit=40 +https://hachyderm.io/tags/devops.rss?limit=60 +``` + +#### Hashtag Strategy + +**Tier 1: Core Topics (Use high limits)** +```bash +# Your primary interests - use limit=75-100 +https://fosstodon.org/tags/homelab.rss?limit=100 +https://fosstodon.org/tags/selfhosting.rss?limit=100 +https://fosstodon.org/tags/docker.rss?limit=100 +``` + +**Tier 2: Secondary Topics (Moderate limits)** +```bash +# Related interests - use limit=50-75 +https://mastodon.social/tags/linux.rss?limit=50 +https://social.tchncs.de/tags/privacy.rss?limit=50 +https://infosec.exchange/tags/cybersecurity.rss?limit=60 +``` + +**Tier 3: Discovery Topics (Conservative limits)** +```bash +# Exploration areas - use limit=25-40 +https://mastodon.social/tags/photography.rss?limit=30 +https://pixelfed.social/tags/art.rss?limit=25 +``` + +### Feed Quality Assessment + +#### Monitoring Feed Performance + +```bash +# Check feed response times +curl -w "@curl-format.txt" -s -o /dev/null https://fosstodon.org/tags/homelab.rss + +# curl-format.txt content: +# time_namelookup: %{time_namelookup}\n +# time_connect: %{time_connect}\n +# time_appconnect: %{time_appconnect}\n +# time_pretransfer: %{time_pretransfer}\n +# time_redirect: %{time_redirect}\n +# time_starttransfer: %{time_starttransfer}\n +# ----------\n +# time_total: %{time_total}\n +``` + +#### Feed Quality Metrics + +**High-Quality Indicators:** +- Response time < 2 seconds +- Consistent content updates +- Low duplicate rate with other feeds +- Active community engagement + +**Red Flags:** +- Frequent timeouts or errors +- Very high duplicate rate +- Spam or low-quality content +- Instance frequently down + +### Geographic and Language Considerations + +```bash +# English-language instances +https://fosstodon.org/tags/homelab.rss?limit=75 +https://hachyderm.io/tags/devops.rss?limit=50 + +# German-language instances +https://social.tchncs.de/tags/homelab.rss?limit=50 +https://chaos.social/tags/34c3.rss?limit=25 + +# Multi-language instances +https://mastodon.social/tags/technology.rss?limit=40 +``` + +## Production Deployment + +### Docker Compose Production Configuration + +```yaml +version: '3.8' + +services: + gts-holmirdas: + image: gts-holmirdas:latest + container_name: gts-holmirdas-prod + restart: unless-stopped + + # Resource limits + deploy: + resources: + limits: + memory: 512M + cpus: '0.5' + reservations: + memory: 256M + cpus: '0.25' + + # Security + user: "1000:1000" + read_only: true + security_opt: + - no-new-privileges:true + + # Networking + networks: + - gts-network + + # Environment + env_file: + - .env.production + + # Volumes + volumes: + - ./data:/app/data:rw + - ./rss_feeds.txt:/app/rss_feeds.txt:ro + - ./logs:/app/logs:rw + - /tmp:/tmp:rw # Required for read_only mode + + # Health check + healthcheck: + test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8080/health')"] + interval: 30m + timeout: 10s + retries: 3 + start_period: 5m + + # Logging + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "5" + compress: "true" + +networks: + gts-network: + external: true +``` + +### Production Environment Variables + +```bash +# .env.production +GTS_SERVER_URL=https://social.yourdomain.com +GTS_ACCESS_TOKEN=prod_token_here + +# Production tuning +MAX_POSTS_PER_RUN=50 +DELAY_BETWEEN_REQUESTS=1 +LOG_LEVEL=INFO +LOG_FORMAT=json +LOG_TO_FILE=true + +# Security hardening +VALIDATE_SSL=true +REQUEST_TIMEOUT=30 +MAX_RETRIES=2 +MEMORY_LIMIT=512MB + +# Monitoring +HEALTHCHECK_URL=https://hc-ping.com/your-production-uuid +``` + +### Monitoring and Alerting + +#### Log Analysis Setup + +```bash +# Structured logging for analysis +LOG_FORMAT=json + +# Example log analysis with jq +docker logs gts-holmirdas 2>&1 | jq -r 'select(.level=="ERROR") | .message' + +# Performance monitoring +docker logs gts-holmirdas 2>&1 | jq -r 'select(.posts_processed) | "\(.timestamp): \(.posts_processed) posts in \(.runtime)"' +``` + +#### Metrics Collection + +```bash +# Custom metrics script +#!/bin/bash +# metrics-collector.sh + +STATS=$(docker logs gts-holmirdas --tail=1 2>&1 | grep "Run Statistics" -A 6) +RUNTIME=$(echo "$STATS" | grep "Runtime" | cut -d':' -f2- | tr -d ' ') +POSTS=$(echo "$STATS" | grep "Total posts" | cut -d':' -f2 | tr -d ' ') +INSTANCES=$(echo "$STATS" | grep "Current known" | cut -d':' -f2 | tr -d ' ') + +# Send to monitoring system +curl -X POST "https://monitoring.yourdomain.com/metrics" \ + -H "Content-Type: application/json" \ + -d "{\"runtime\":\"$RUNTIME\",\"posts\":$POSTS,\"instances\":$INSTANCES}" +``` + +### Backup and Recovery + +#### Automated Backup Script + +```bash +#!/bin/bash +# backup-gts-holmirdas.sh + +BACKUP_DIR="/backups/gts-holmirdas" +DATE=$(date +%Y%m%d_%H%M%S) +CONTAINER="gts-holmirdas" + +# Create backup directory +mkdir -p "$BACKUP_DIR" + +# Stop container gracefully +docker-compose stop gts-holmirdas + +# Backup data directory +tar -czf "$BACKUP_DIR/data_$DATE.tar.gz" ./data + +# Backup configuration +cp .env "$BACKUP_DIR/env_$DATE" +cp rss_feeds.txt "$BACKUP_DIR/rss_feeds_$DATE.txt" +cp docker-compose.yml "$BACKUP_DIR/compose_$DATE.yml" + +# Restart container +docker-compose start gts-holmirdas + +# Cleanup old backups (keep 30 days) +find "$BACKUP_DIR" -name "*.tar.gz" -mtime +30 -delete +find "$BACKUP_DIR" -name "env_*" -mtime +30 -delete +find "$BACKUP_DIR" -name "rss_feeds_*" -mtime +30 -delete + +echo "Backup completed: $DATE" +``` + +#### Recovery Procedure + +```bash +# Full recovery from backup +#!/bin/bash +# restore-gts-holmirdas.sh + +BACKUP_DATE=$1 # e.g., 20240115_143022 + +if [ -z "$BACKUP_DATE" ]; then + echo "Usage: $0 " + exit 1 +fi + +# Stop current container +docker-compose down + +# Restore data +tar -xzf "/backups/gts-holmirdas/data_$BACKUP_DATE.tar.gz" + +# Restore configuration +cp "/backups/gts-holmirdas/env_$BACKUP_DATE" .env +cp "/backups/gts-holmirdas/rss_feeds_$BACKUP_DATE.txt" rss_feeds.txt +cp "/backups/gts-holmirdas/compose_$BACKUP_DATE.yml" docker-compose.yml + +# Restart with restored configuration +docker-compose up -d + +echo "Recovery completed from backup: $BACKUP_DATE" +``` + +## Multi-Instance Deployment + +### Load Balancing Multiple Instances + +For very large deployments, you can run multiple GTS-HolMirDas instances: + +```yaml +# docker-compose-multi.yml +version: '3.8' + +services: + gts-holmirdas-1: + image: gts-holmirdas:latest + env_file: .env.1 + volumes: + - ./data1:/app/data + - ./feeds1.txt:/app/rss_feeds.txt:ro + + gts-holmirdas-2: + image: gts-holmirdas:latest + env_file: .env.2 + volumes: + - ./data2:/app/data + - ./feeds2.txt:/app/rss_feeds.txt:ro +``` + +### Feed Distribution Strategy + +```bash +# feeds1.txt - Tech focus +https://fosstodon.org/tags/homelab.rss?limit=100 +https://fosstodon.org/tags/docker.rss?limit=100 +https://infosec.exchange/tags/security.rss?limit=100 + +# feeds2.txt - General topics +https://mastodon.social/tags/technology.rss?limit=50 +https://hachyderm.io/tags/programming.rss?limit=50 +https://social.tchncs.de/tags/linux.rss?limit=50 +``` + +## Integration with External Systems + +### Webhook Integration + +```bash +# Add to .env +WEBHOOK_URL=https://your-system.com/webhook/gts-holmirdas +WEBHOOK_SECRET=your_webhook_secret + +# Webhook payload example: +{ + "timestamp": "2024-01-15T14:30:22Z", + "runtime": "0:04:23", + "posts_processed": 87, + "instances_discovered": 12, + "total_instances": 2847, + "feeds_processed": 45, + "success": true +} +``` + +### Prometheus Metrics + +```bash +# Custom metrics exporter +#!/bin/bash +# prometheus-metrics.sh + +# Parse latest statistics +STATS=$(docker logs gts-holmirdas --tail=1 2>&1 | grep "Run Statistics" -A 6) + +# Extract metrics +POSTS=$(echo "$STATS" | grep "Total posts" | grep -o '[0-9]\+') +INSTANCES=$(echo "$STATS" | grep "Current known" | grep -o '[0-9]\+') +RUNTIME_MIN=$(echo "$STATS" | grep "Runtime" | grep -o '[0-9]\+:[0-9]\+' | cut -d':' -f2) + +# Export to Prometheus format +cat > /tmp/gts-holmirdas-metrics.prom << EOF +# HELP gts_holmirdas_posts_processed_total Total posts processed in last run +# TYPE gts_holmirdas_posts_processed_total counter +gts_holmirdas_posts_processed_total $POSTS + +# HELP gts_holmirdas_instances_known Total known fediverse instances +# TYPE gts_holmirdas_instances_known gauge +gts_holmirdas_instances_known $INSTANCES + +# HELP gts_holmirdas_runtime_minutes Runtime of last processing run in minutes +# TYPE gts_holmirdas_runtime_minutes gauge +gts_holmirdas_runtime_minutes $RUNTIME_MIN +EOF +``` + +## Advanced Troubleshooting + +### Performance Profiling + +```bash +# Enable detailed profiling +LOG_LEVEL=DEBUG +PROFILE_ENABLED=true +PROFILE_OUTPUT_DIR=/app/profiles + +# Analyze performance bottlenecks +docker-compose exec gts-holmirdas python3 -m cProfile -o /app/profiles/profile.out /app/gts_holmirdas.py +``` + +### Custom User Agent Configuration + +```bash +# Avoid rate limiting by customizing User-Agent +USER_AGENT="GTS-HolMirDas/1.1.0 (+https://git.klein.ruhr/matthias/gts-holmirdas)" +``` + +### Network Optimization + +```bash +# DNS caching for better performance +DNS_CACHE_TTL=300 + +# Connection pooling +CONNECTION_POOL_SIZE=10 +CONNECTION_POOL_MAXSIZE=20 +``` + +This advanced configuration guide should help optimize GTS-HolMirDas for any production environment or specific use case! \ No newline at end of file