diff --git a/Advanced-Configuration.md b/Advanced-Configuration.md index 0fd4e68..556203f 100644 --- a/Advanced-Configuration.md +++ b/Advanced-Configuration.md @@ -1,473 +1,75 @@ -# Advanced Configuration +# 🔧 Advanced Configuration -Comprehensive guide to fine-tuning **GTS-HolMirDas** for production environments and specific use cases. +Comprehensive guide to advanced GTS-HolMirDas configuration options and production deployment strategies. -## Environment Variables Reference +## 🏗️ Production Deployment -### Core Configuration - -```bash -# GoToSocial Connection (Required) -GTS_SERVER_URL=https://your-gts-instance.tld -GTS_ACCESS_TOKEN=your_access_token_here - -# Processing Control -MAX_POSTS_PER_RUN=25 # Posts per feed per run (1-100) -DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls (0.5-5) -LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR - -# File Configuration -RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file - -# Optional Features -HEALTHCHECK_URL= # Healthchecks.io ping URL -USER_AGENT=GTS-HolMirDas/1.1.0 # Custom User-Agent string -``` - -### Advanced Processing Options - -```bash -# Memory Management -DUPLICATE_CACHE_SIZE=10000 # Max URLs to cache (affects memory) -BATCH_SIZE=50 # Posts processed per batch - -# Network Configuration -REQUEST_TIMEOUT=30 # Seconds to wait for RSS/API responses -MAX_RETRIES=3 # Retry attempts for failed requests -BACKOFF_FACTOR=2 # Exponential backoff multiplier - -# Federation Control -INSTANCE_DISCOVERY=true # Enable automatic instance discovery -MIN_INSTANCE_POSTS=5 # Minimum posts before counting instance -``` - -### Production Hardening - -```bash -# Security -VALIDATE_SSL=true # Enforce SSL certificate validation -ALLOWED_DOMAINS= # Comma-separated list of allowed RSS domains - -# Resource Limits -MAX_FEED_SIZE=10MB # Maximum RSS feed size to process -MAX_PROCESSING_TIME=1800 # Kill run after 30 minutes -MEMORY_LIMIT=512MB # Container memory limit (Docker) - -# Logging -LOG_FORMAT=json # json, text -LOG_TO_FILE=true # Enable file logging -LOG_RETENTION_DAYS=30 # Days to keep log files -``` - -## RSS Feed Strategies - -### Feed Selection Methodology - -#### High-Quality Instance Selection - -**Tech-Focused Instances (Recommended):** -```bash -# Excellent signal-to-noise ratio -https://fosstodon.org/tags/homelab.rss?limit=100 -https://infosec.exchange/tags/security.rss?limit=100 -https://social.tchncs.de/tags/linux.rss?limit=75 - -# Specialized communities -https://chaos.social/tags/ccc.rss?limit=50 -https://mas.to/tags/privacy.rss?limit=50 -``` - -**Balanced General Instances:** -```bash -# Large instances with moderate limits -https://mastodon.social/tags/technology.rss?limit=50 -https://mstdn.social/tags/programming.rss?limit=40 -https://hachyderm.io/tags/devops.rss?limit=60 -``` - -#### Hashtag Strategy - -**Tier 1: Core Topics (Use high limits)** -```bash -# Your primary interests - use limit=75-100 -https://fosstodon.org/tags/homelab.rss?limit=100 -https://fosstodon.org/tags/selfhosting.rss?limit=100 -https://fosstodon.org/tags/docker.rss?limit=100 -``` - -**Tier 2: Secondary Topics (Moderate limits)** -```bash -# Related interests - use limit=50-75 -https://mastodon.social/tags/linux.rss?limit=50 -https://social.tchncs.de/tags/privacy.rss?limit=50 -https://infosec.exchange/tags/cybersecurity.rss?limit=60 -``` - -**Tier 3: Discovery Topics (Conservative limits)** -```bash -# Exploration areas - use limit=25-40 -https://mastodon.social/tags/photography.rss?limit=30 -https://pixelfed.social/tags/art.rss?limit=25 -``` - -### Feed Quality Assessment - -#### Monitoring Feed Performance - -```bash -# Check feed response times -curl -w "@curl-format.txt" -s -o /dev/null https://fosstodon.org/tags/homelab.rss - -# curl-format.txt content: -# time_namelookup: %{time_namelookup}\n -# time_connect: %{time_connect}\n -# time_appconnect: %{time_appconnect}\n -# time_pretransfer: %{time_pretransfer}\n -# time_redirect: %{time_redirect}\n -# time_starttransfer: %{time_starttransfer}\n -# ----------\n -# time_total: %{time_total}\n -``` - -#### Feed Quality Metrics - -**High-Quality Indicators:** -- Response time < 2 seconds -- Consistent content updates -- Low duplicate rate with other feeds -- Active community engagement - -**Red Flags:** -- Frequent timeouts or errors -- Very high duplicate rate -- Spam or low-quality content -- Instance frequently down - -### Geographic and Language Considerations - -```bash -# English-language instances -https://fosstodon.org/tags/homelab.rss?limit=75 -https://hachyderm.io/tags/devops.rss?limit=50 - -# German-language instances -https://social.tchncs.de/tags/homelab.rss?limit=50 -https://chaos.social/tags/34c3.rss?limit=25 - -# Multi-language instances -https://mastodon.social/tags/technology.rss?limit=40 -``` - -## Production Deployment - -### Docker Compose Production Configuration +### Docker Resource Management ```yaml +# compose.yml- Resource limits services: gts-holmirdas: - image: gts-holmirdas:latest - container_name: gts-holmirdas-prod - restart: unless-stopped - - # Resource limits deploy: resources: limits: memory: 512M - cpus: '0.5' reservations: memory: 256M - cpus: '0.25' - - # Security - user: "1000:1000" - read_only: true - security_opt: - - no-new-privileges:true - - # Networking - networks: - - gts-network - - # Environment - env_file: - - .env.production - - # Volumes - volumes: - - ./data:/app/data:rw - - ./rss_feeds.txt:/app/rss_feeds.txt:ro - - ./logs:/app/logs:rw - - /tmp:/tmp:rw # Required for read_only mode - - # Health check - healthcheck: - test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8080/health')"] - interval: 30m - timeout: 10s - retries: 3 - start_period: 5m - - # Logging - logging: - driver: "json-file" - options: - max-size: "10m" - max-file: "5" - compress: "true" - -networks: - gts-network: - external: true + restart: unless-stopped ``` -### Production Environment Variables +### Data Persistence Strategy + +**Important data locations:** +- `data/processed_urls.json` - Processing history (prevents duplicates) +- `rss_feeds.txt` - RSS feed configuration +- `.env` - Environment configuration ```bash -# .env.production -GTS_SERVER_URL=https://social.yourdomain.com -GTS_ACCESS_TOKEN=prod_token_here - -# Production tuning -MAX_POSTS_PER_RUN=50 -DELAY_BETWEEN_REQUESTS=1 -LOG_LEVEL=INFO -LOG_FORMAT=json -LOG_TO_FILE=true - -# Security hardening -VALIDATE_SSL=true -REQUEST_TIMEOUT=30 -MAX_RETRIES=2 -MEMORY_LIMIT=512MB - -# Monitoring -HEALTHCHECK_URL=https://hc-ping.com/your-production-uuid -``` - -### Monitoring and Alerting - -#### Log Analysis Setup - -```bash -# Structured logging for analysis -LOG_FORMAT=json - -# Example log analysis with jq -docker logs gts-holmirdas 2>&1 | jq -r 'select(.level=="ERROR") | .message' - -# Performance monitoring -docker logs gts-holmirdas 2>&1 | jq -r 'select(.posts_processed) | "\(.timestamp): \(.posts_processed) posts in \(.runtime)"' -``` - -#### Metrics Collection - -```bash -# Custom metrics script -#!/bin/bash -# metrics-collector.sh - -STATS=$(docker logs gts-holmirdas --tail=1 2>&1 | grep "Run Statistics" -A 6) -RUNTIME=$(echo "$STATS" | grep "Runtime" | cut -d':' -f2- | tr -d ' ') -POSTS=$(echo "$STATS" | grep "Total posts" | cut -d':' -f2 | tr -d ' ') -INSTANCES=$(echo "$STATS" | grep "Current known" | cut -d':' -f2 | tr -d ' ') - -# Send to monitoring system -curl -X POST "https://monitoring.yourdomain.com/metrics" \ - -H "Content-Type: application/json" \ - -d "{\"runtime\":\"$RUNTIME\",\"posts\":$POSTS,\"instances\":$INSTANCES}" -``` - -### Backup and Recovery - -#### Automated Backup Script - -```bash -#!/bin/bash -# backup-gts-holmirdas.sh - -BACKUP_DIR="/backups/gts-holmirdas" -DATE=$(date +%Y%m%d_%H%M%S) -CONTAINER="gts-holmirdas" - -# Create backup directory -mkdir -p "$BACKUP_DIR" - -# Stop container gracefully -docker compose stop gts-holmirdas - -# Backup data directory -tar -czf "$BACKUP_DIR/data_$DATE.tar.gz" ./data - -# Backup configuration -cp .env "$BACKUP_DIR/env_$DATE" -cp rss_feeds.txt "$BACKUP_DIR/rss_feeds_$DATE.txt" -cp compose.yml"$BACKUP_DIR/compose_$DATE.yml" - -# Restart container -docker compose start gts-holmirdas - -# Cleanup old backups (keep 30 days) -find "$BACKUP_DIR" -name "*.tar.gz" -mtime +30 -delete -find "$BACKUP_DIR" -name "env_*" -mtime +30 -delete -find "$BACKUP_DIR" -name "rss_feeds_*" -mtime +30 -delete - -echo "Backup completed: $DATE" -``` - -#### Recovery Procedure - -```bash -# Full recovery from backup -#!/bin/bash -# restore-gts-holmirdas.sh - -BACKUP_DATE=$1 # e.g., 20240115_143022 - -if [ -z "$BACKUP_DATE" ]; then - echo "Usage: $0 " - exit 1 -fi - -# Stop current container -docker compose down - -# Restore data -tar -xzf "/backups/gts-holmirdas/data_$BACKUP_DATE.tar.gz" - -# Restore configuration -cp "/backups/gts-holmirdas/env_$BACKUP_DATE" .env -cp "/backups/gts-holmirdas/rss_feeds_$BACKUP_DATE.txt" rss_feeds.txt -cp "/backups/gts-holmirdas/compose_$BACKUP_DATE.yml" docker-compose.yml - -# Restart with restored configuration -docker compose up -d - -echo "Recovery completed from backup: $BACKUP_DATE" -``` - -## Multi-Instance Deployment - -### Load Balancing Multiple Instances - -For very large deployments, you can run multiple GTS-HolMirDas instances: - -```yaml -# docker-compose-multi.yml -version: '3.8' - -services: - gts-holmirdas-1: - image: gts-holmirdas:latest - env_file: .env.1 - volumes: - - ./data1:/app/data - - ./feeds1.txt:/app/rss_feeds.txt:ro +# Backup critical data +tar -czf gts-holmirdas-backup-$(date +%Y%m%d).tar.gz \ + data/ rss_feeds.txt .env - gts-holmirdas-2: - image: gts-holmirdas:latest - env_file: .env.2 - volumes: - - ./data2:/app/data - - ./feeds2.txt:/app/rss_feeds.txt:ro +# For persistent storage +mkdir -p ./data +chown 1000:1000 ./data ``` -### Feed Distribution Strategy +## 📡 RSS Feed Management + +### Feed Categories & Organization + +Organize your RSS feeds by content type: + +```txt +# rss_feeds.txt - Organized by category + +# Homelab & Self-hosting +https://mastodon.social/tags/homelab.rss +https://fosstodon.org/tags/selfhosting.rss + +# Docker & Container Technology +https://social.tchncs.de/tags/docker.rss +https://mastodon.social/tags/kubernetes.rss + +# Open Source & Development +https://fosstodon.org/tags/opensource.rss +https://hachyderm.io/tags/programming.rss +``` + +### Feed Quality Assessment + +Monitor which feeds provide the best instance discovery: ```bash -# feeds1.txt - Tech focus -https://fosstodon.org/tags/homelab.rss?limit=100 -https://fosstodon.org/tags/docker.rss?limit=100 -https://infosec.exchange/tags/security.rss?limit=100 - -# feeds2.txt - General topics -https://mastodon.social/tags/technology.rss?limit=50 -https://hachyderm.io/tags/programming.rss?limit=50 -https://social.tchncs.de/tags/linux.rss?limit=50 +# Check feed performance +grep "Successfully looked up" logs | \ + cut -d'/' -f3 | sort | uniq -c | sort -nr ``` -## Integration with External Systems - -### Webhook Integration - -```bash -# Add to .env -WEBHOOK_URL=https://your-system.com/webhook/gts-holmirdas -WEBHOOK_SECRET=your_webhook_secret - -# Webhook payload example: -{ - "timestamp": "2024-01-15T14:30:22Z", - "runtime": "0:04:23", - "posts_processed": 87, - "instances_discovered": 12, - "total_instances": 2847, - "feeds_processed": 45, - "success": true -} -``` - -### Prometheus Metrics - -```bash -# Custom metrics exporter -#!/bin/bash -# prometheus-metrics.sh - -# Parse latest statistics -STATS=$(docker logs gts-holmirdas --tail=1 2>&1 | grep "Run Statistics" -A 6) - -# Extract metrics -POSTS=$(echo "$STATS" | grep "Total posts" | grep -o '[0-9]\+') -INSTANCES=$(echo "$STATS" | grep "Current known" | grep -o '[0-9]\+') -RUNTIME_MIN=$(echo "$STATS" | grep "Runtime" | grep -o '[0-9]\+:[0-9]\+' | cut -d':' -f2) - -# Export to Prometheus format -cat > /tmp/gts-holmirdas-metrics.prom << EOF -# HELP gts_holmirdas_posts_processed_total Total posts processed in last run -# TYPE gts_holmirdas_posts_processed_total counter -gts_holmirdas_posts_processed_total $POSTS - -# HELP gts_holmirdas_instances_known Total known fediverse instances -# TYPE gts_holmirdas_instances_known gauge -gts_holmirdas_instances_known $INSTANCES - -# HELP gts_holmirdas_runtime_minutes Runtime of last processing run in minutes -# TYPE gts_holmirdas_runtime_minutes gauge -gts_holmirdas_runtime_minutes $RUNTIME_MIN -EOF -``` - -## Advanced Troubleshooting - -### Performance Profiling - -```bash -# Enable detailed profiling -LOG_LEVEL=DEBUG -PROFILE_ENABLED=true -PROFILE_OUTPUT_DIR=/app/profiles - -# Analyze performance bottlenecks -docker compose exec gts-holmirdas python3 -m cProfile -o /app/profiles/profile.out /app/gts_holmirdas.py -``` - -### Custom User Agent Configuration - -```bash -# Avoid rate limiting by customizing User-Agent -USER_AGENT="GTS-HolMirDas/1.1.0 (+https://git.klein.ruhr/matthias/gts-holmirdas)" -``` - -### Network Optimization - -```bash -# DNS caching for better performance -DNS_CACHE_TTL=300 - -# Connection pooling -CONNECTION_POOL_SIZE=10 -CONNECTION_POOL_MAXSIZE=20 -``` - -This advanced configuration guide should help optimize GTS-HolMirDas for any production environment or specific use case! \ No newline at end of file +**High-value feed characteristics:** +- Active communities (20+ posts/day) +- Diverse user base (multiple instances) +- Technical content (better federation) +- Regular posting schedule \ No newline at end of file