Update Performance & Scaling

2025-08-03 20:25:36 +00:00 · 2025-08-03 20:25:36 +00:00 · 9a9f8ae63f
commit 9a9f8ae63f
parent cdc82390e8
2 changed files with 323 additions and 1 deletions
--- a/Performance-%26-Scaling.md
+++ b/Performance-%26-Scaling.md
@ -0,0 +1,323 @@
+# Performance & Scaling
+
+Complete guide to optimizing GTS-HolMirDas for different server configurations and performance requirements.
+
+## 🚀 RSS Feed Optimization (v1.1.0+)
+
+GTS-HolMirDas supports URL parameters to dramatically increase content discovery without additional API calls.
+
+### RSS Feed Limits
+
+Most Mastodon-compatible instances support the `?limit=X` parameter:
+
+```bash
+# Default behavior (20 posts per feed)
+https://mastodon.social/tags/homelab.rss
+
+# Increased limits (up to 100 posts per feed)
+https://mastodon.social/tags/homelab.rss?limit=50
+https://fosstodon.org/tags/docker.rss?limit=100
+```
+
+**Supported limits:** 20 (default), 50, 75, 100 (instance-dependent)
+
+## 📊 Performance Impact Tables
+
+### Configuration Comparison
+
+| Configuration | Posts/Run | API Calls | Processing Time | Memory Impact |
+|---------------|-----------|-----------|-----------------|---------------|
+| **Conservative** | ~100 posts | 30+ feeds | 2-5 minutes | +50MB |
+| **Standard (limit=20)** | ~200 posts | 30+ feeds | 3-8 minutes | +100MB |
+| **Optimized (limit=50)** | ~500 posts | 30+ feeds | 5-12 minutes | +200MB |
+| **Aggressive (limit=75)** | ~750 posts | 30+ feeds | 8-15 minutes | +300MB |
+| **Maximum (limit=100)** | ~1000 posts | 30+ feeds | 10-20 minutes | +400MB |
+
+### Real Production Data
+
+Based on actual deployments with 102 RSS feeds:
+
+```
+📊 Conservative Setup (limit=20-50):
+   ⏱️  Runtime: 4:14 | 📄 Posts: 245 | ⚡ 58 posts/minute
+   🌐 New instances: +15 | 💾 Memory: ~350MB total
+
+📊 Balanced Setup (limit=50-75):  
+   ⏱️  Runtime: 6:32 | 📄 Posts: 487 | ⚡ 74 posts/minute
+   🌐 New instances: +28 | 💾 Memory: ~450MB total
+
+📊 Aggressive Setup (limit=75-100):
+   ⏱️  Runtime: 8:42 | 📄 Posts: 1045 | ⚡ 120 posts/minute  
+   🌐 New instances: +45 | 💾 Memory: ~650MB total
+```
+
+## ⚙️ Configuration Tuning
+
+### Environment Variables
+
+```bash
+# Processing Configuration
+MAX_POSTS_PER_RUN=75        # Increase for higher limits
+DELAY_BETWEEN_REQUESTS=1    # Balance speed vs. server load
+RSS_URLS_FILE=/app/rss_feeds.txt
+
+# Recommended combinations by server capacity:
+
+# Small VPS (1GB RAM):
+MAX_POSTS_PER_RUN=40
+DELAY_BETWEEN_REQUESTS=2
+
+# Medium Server (2-4GB RAM): 
+MAX_POSTS_PER_RUN=75
+DELAY_BETWEEN_REQUESTS=1
+
+# Powerful Server (4GB+ RAM):
+MAX_POSTS_PER_RUN=100
+DELAY_BETWEEN_REQUESTS=1
+```
+
+### RSS Feed Strategy
+
+#### Progressive Scaling Approach
+
+**Phase 1: Testing (Week 1)**
+```bash
+# Start with mixed limits to test performance
+https://mastodon.social/tags/homelab.rss?limit=30
+https://fosstodon.org/tags/selfhosting.rss?limit=40  
+https://chaos.social/tags/docker.rss?limit=50
+```
+
+**Phase 2: Optimization (Week 2-3)**
+```bash
+# Increase gradually based on server capacity
+https://mastodon.social/tags/homelab.rss?limit=50
+https://fosstodon.org/tags/selfhosting.rss?limit=75
+https://chaos.social/tags/docker.rss?limit=100
+```
+
+**Phase 3: Production (Week 4+)**
+```bash
+# Full optimization based on monitoring results
+https://mastodon.social/tags/homelab.rss?limit=100
+https://fosstodon.org/tags/selfhosting.rss?limit=100  
+https://chaos.social/tags/docker.rss?limit=100
+```
+
+#### Instance Quality Assessment
+
+**High-Quality Instances (recommended for aggressive limits):**
+```bash
+# Tech-focused instances (good signal-to-noise ratio)
+https://fosstodon.org/tags/homelab.rss?limit=100
+https://infosec.exchange/tags/security.rss?limit=100
+https://social.tchncs.de/tags/linux.rss?limit=75
+
+# Specialized communities
+https://chaos.social/tags/ccc.rss?limit=50
+https://pixelfed.social/tags/photography.rss?limit=50
+```
+
+**General Instances (moderate limits recommended):**
+```bash
+# Large general instances (more noise, use moderate limits)
+https://mastodon.social/tags/technology.rss?limit=50
+https://mstdn.social/tags/programming.rss?limit=40
+```
+
+## 📈 Monitoring & Optimization
+
+### Performance Metrics
+
+The statistics output shows real-time performance indicators:
+
+```
+📊 GTS-HolMirDas Run Statistics:
+   ⏱️  Runtime: 0:08:42           # Target: <15 minutes
+   📄 Total posts processed: 487   # Scales with limits
+   🌐 Current known instances: 3150 # Cumulative growth  
+   ➕ New instances discovered: +45 # Per-run discovery
+   📡 RSS feeds processed: 102     # Your feed count
+   ⚡ Posts per minute: 56.0       # Processing efficiency
+```
+
+### Key Performance Indicators
+
+**Runtime Optimization:**
+- **Target:** <15 minutes per run
+- **Good:** 5-10 minutes  
+- **Excellent:** <5 minutes
+
+**Discovery Efficiency:**
+- **New instances per run:** 20-50+ (higher with more aggressive limits)
+- **Posts per minute:** 30-100+ (depends on server and network speed)
+- **Federation growth:** 100-200+ new instances per week
+
+**Resource Utilization:**
+- **Memory growth:** Linear with post count (~0.5MB per 100 posts)
+- **Storage growth:** ~50-100MB per month (processed URLs tracking)
+- **Network usage:** ~1-5MB per run (RSS fetching + API calls)
+
+### Optimization Guidelines
+
+#### Memory Management
+
+**Monitor GoToSocial Memory Usage:**
+```bash
+# Check memory usage during runs
+docker stats gotosocial
+docker stats gts-holmirdas
+
+# Memory impact per configuration:
+# Conservative: +50-100MB during processing
+# Balanced: +100-200MB during processing  
+# Aggressive: +200-400MB during processing
+```
+
+**Memory Optimization Tips:**
+- Each 100 additional posts ≈ ~2-5MB additional RAM usage
+- Peak memory usage occurs during duplicate detection
+- Memory returns to baseline after run completion
+- Recommended: 1GB+ total RAM for aggressive configurations
+
+#### Processing Time Optimization
+
+**Scales linearly with:**
+- `MAX_POSTS_PER_RUN × number_of_feeds`
+- Network latency to RSS sources
+- GoToSocial API response times
+
+**Optimization strategies:**
+```bash
+# If processing takes too long:
+MAX_POSTS_PER_RUN=50          # Reduce from 75/100
+DELAY_BETWEEN_REQUESTS=2      # Increase from 1
+
+# If network timeouts occur:
+DELAY_BETWEEN_REQUESTS=3      # More conservative timing
+# Reduce RSS feed count temporarily
+
+# If duplicate detection is slow:
+# Clean processed URLs periodically (monthly):
+docker-compose exec gts-holmirdas rm -f /app/data/processed_urls.json
+```
+
+#### Federation Growth Optimization
+
+**Maximize Instance Discovery:**
+- Higher `?limit=` parameters = more diverse instance discovery
+- Expect 20-50+ new instances per optimized run
+- Specialized hashtags often yield better quality content
+- Mix of instance types (tech, general, niche) provides diversity
+
+**Balance Discovery vs. Storage:**
+- More instances = larger GoToSocial database
+- Monitor database growth: ~10GB per year for active instances
+- Consider storage capacity when planning aggressive scaling
+
+## 🛠️ Troubleshooting High-Volume Setups
+
+### Common Scaling Issues
+
+#### Issue: Processing Takes Too Long
+```bash
+# Solution 1: Reduce volume
+MAX_POSTS_PER_RUN=50        # Reduce from 75/100
+DELAY_BETWEEN_REQUESTS=2    # Increase from 1
+
+# Solution 2: Optimize feeds
+# Remove low-quality or duplicate feeds
+# Focus on high-signal instances
+```
+
+#### Issue: GoToSocial Uses Too Much Memory
+```bash
+# Solution 1: Reduce processing volume
+# Lower ?limit= parameters to 50 instead of 100
+# Reduce RSS feed count temporarily
+
+# Solution 2: Increase run frequency instead of volume
+# Run every 30 minutes with limit=25 instead of hourly with limit=75
+```
+
+#### Issue: Duplicate Detection Slow
+```bash
+# Solution: Storage cleanup (monthly maintenance)
+docker-compose exec gts-holmirdas rm -f /app/data/processed_urls.json
+
+# Note: This forces fresh state tracking 
+# Posts will be reprocessed once, then normal duplicate detection resumes
+```
+
+#### Issue: Network Timeouts
+```bash
+# Solution: More conservative timing
+DELAY_BETWEEN_REQUESTS=3    # Increase from 1-2
+MAX_POSTS_PER_RUN=40       # Reduce load
+
+# Check network connectivity:
+curl -I https://mastodon.social/tags/test.rss
+```
+
+## 🎯 Best Practices by Server Size
+
+### Small VPS (1GB RAM, 1 CPU)
+```bash
+# Configuration
+MAX_POSTS_PER_RUN=25
+DELAY_BETWEEN_REQUESTS=2
+
+# RSS Strategy  
+# 10-20 feeds with limit=30-50
+# Focus on quality over quantity
+# Monitor memory usage closely
+```
+
+### Medium Server (2-4GB RAM, 2+ CPU)
+```bash
+# Configuration
+MAX_POSTS_PER_RUN=50
+DELAY_BETWEEN_REQUESTS=1
+
+# RSS Strategy
+# 30-50 feeds with limit=50-75  
+# Good balance of discovery and performance
+# Recommended for most deployments
+```
+
+### Powerful Server (4GB+ RAM, 4+ CPU)
+```bash  
+# Configuration
+MAX_POSTS_PER_RUN=100
+DELAY_BETWEEN_REQUESTS=1
+
+# RSS Strategy
+# 50-100+ feeds with limit=75-100
+# Maximum discovery and federation growth
+# Monitor storage growth long-term
+```
+
+## 📋 Performance Checklist
+
+### Pre-Scaling Checklist
+- [ ] Monitor baseline resource usage for 1 week
+- [ ] Verify GoToSocial has adequate RAM (1GB+ recommended)
+- [ ] Test with small feed set before scaling up
+- [ ] Set up monitoring/alerting for resource usage
+- [ ] Plan storage capacity for database growth
+
+### Scaling Process
+- [ ] Increase limits gradually (20→50→75→100)
+- [ ] Monitor each change for 2-3 days
+- [ ] Adjust `MAX_POSTS_PER_RUN` based on processing time
+- [ ] Balance discovery rate with server capacity
+- [ ] Document optimal configuration for your setup
+
+### Post-Scaling Monitoring
+- [ ] Weekly resource usage review
+- [ ] Monthly processed URLs cleanup
+- [ ] Quarterly RSS feed quality assessment
+- [ ] Database growth monitoring
+- [ ] Performance metrics tracking
+
+By following these guidelines, you can optimize GTS-HolMirDas for your specific server configuration and achieve maximum federation efficiency!
--- a/Performance-Scaling.-.md
+++ b/Performance-Scaling.-.md
@ -1 +0,0 @@
-Welcome to the Wiki.