diff --git a/README.md b/README.md index 96916f9..b23f68e 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,20 @@ # GTS-HolMirDas 🚀 -RSS-based content discovery for **[GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial)** instances. +RSS-based content discovery for [GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial) instances. Automatically discovers and federates content from RSS feeds across the Fediverse, helping small GoToSocial instances populate their federated timeline without relying on traditional relays. -*Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) for Misskey by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)), this GoToSocial adaptation extends the RSS-to-ActivityPub concept with enhanced Docker deployment and multi-instance processing.* +Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://mkultra.x27.one/@aliceif), adapted for GoToSocial with enhanced Docker deployment and multi-instance processing. -## Features +## ✨ Key Features -- 📡 **Multi-Instance RSS Discovery** - Fetches content from configurable RSS feeds across Fediverse instances -- ⚡ **Efficient Processing** - Configurable rate limiting and duplicate detection -- 🔧 **Production Ready** - Environment-based config, Docker deployment, health monitoring -- 📊 **Comprehensive Statistics** - Runtime metrics, content processing, and federation growth tracking -- 🐳 **Containerized** - Simple Docker Compose deployment -- 📁 **File-based Configuration** - Easy RSS feed management via text files +- **📡 Multi-Instance Discovery** - Fetches content from configurable RSS feeds across Fediverse instances +- **⚡ Performance Scaling** - 20-100 posts per feed with URL parameters (`?limit=100`) +- **🐳 Production Ready** - Docker deployment, environment-based config, health monitoring +- **📊 Comprehensive Stats** - Runtime metrics, federation growth, performance tracking +- **🔧 Zero Maintenance** - Runs automatically every hour with duplicate detection -## How it Works - -**GTS-HolMirDas** reads RSS feeds from various Fediverse instances and uses GoToSocial's search API to federate the discovered content. This approach: - -- Maintains proper ActivityPub federation (posts remain interactive) -- Respects rate limits and instance policies -- Provides better content discovery for small instances -- Works alongside tools like FediFetcher for comprehensive federation - -## Quick Start +## 🚀 Quick Start ```bash # Clone the repository @@ -36,8 +26,8 @@ cp .env.example .env cp rss_feeds.example.txt rss_feeds.txt # Edit configuration -nano .env # Add your GTS credentials -nano rss_feeds.txt # Customize RSS feeds +nano .env # Add your GTS credentials +nano rss_feeds.txt # Customize RSS feeds # Deploy docker compose up -d @@ -45,261 +35,78 @@ docker compose up -d # Monitor docker compose logs -f ``` -# Performance Scaling & Configuration -## 🚀 RSS Feed Optimization (v1.1.0+) - -GTS-HolMirDas supports URL parameters to dramatically increase content discovery without additional API calls. - -### RSS Feed Limits - -Most Mastodon-compatible instances support the `?limit=X` parameter: +## 📈 Performance at Scale +**Real Production Data:** ``` -# Default behavior (20 posts per feed) -https://mastodon.social/tags/homelab.rss - -# Increased limits (up to 100 posts per feed) -https://mastodon.social/tags/homelab.rss?limit=50 -https://fosstodon.org/tags/docker.rss?limit=100 +📊 Runtime: 8:42 | 487 posts processed | 3,150+ instances discovered +⚡ 56 posts/minute | 102 RSS feeds | +45 new instances per run +💾 Resource usage: ~450MB RAM total (GoToSocial + tools) ``` -**Supported limits:** 20 (default), 50, 75, 100 (instance-dependent) +**Scaling Options:** +- **Conservative:** 20 posts/feed (~100 posts/run) +- **Balanced:** 50 posts/feed (~300 posts/run) +- **Aggressive:** 100 posts/feed (~600 posts/run) -### Performance Impact - -| Configuration | Posts/Run | API Calls | Processing Time | -|---------------|-----------|-----------|-----------------| -| Standard (limit=20) | ~100 posts | 30+ feeds | 2-5 minutes | -| Optimized (limit=50) | ~300 posts | 30+ feeds | 5-10 minutes | -| Maximum (limit=100) | ~600 posts | 30+ feeds | 8-15 minutes | - -## ⚙️ Configuration Tuning - -### Environment Variables - -```env -# Processing Configuration -MAX_POSTS_PER_RUN=75 # Increase for higher limits -DELAY_BETWEEN_REQUESTS=1 # Balance speed vs. server load -RSS_URLS_FILE=/app/rss_feeds.txt - -# Recommended combinations: -# Conservative: MAX_POSTS_PER_RUN=40, limit=50 -# Balanced: MAX_POSTS_PER_RUN=75, limit=100 -# Aggressive: MAX_POSTS_PER_RUN=100, limit=100 -``` - -### RSS Feed Strategy - -``` -# Progressive scaling approach: -# 1. Start with mixed limits to test performance -# 2. Increase gradually based on server capacity -# 3. Monitor GoToSocial memory usage - -# Example progression: -https://mastodon.social/tags/homelab.rss?limit=50 -https://fosstodon.org/tags/selfhosting.rss?limit=75 -https://chaos.social/tags/docker.rss?limit=100 -``` - -## 📊 Monitoring & Optimization - -### Performance Metrics - -The statistics output shows real-time performance: - -``` -📊 GTS-HolMirDas Run Statistics: - ⏱️ Runtime: 0:08:42 - 📄 Total posts processed: 487 - 🌐 Current known instances: 3150 - ➕ New instances discovered: +45 - 📡 RSS feeds processed: 102 - ⚡ Posts per minute: 56.0 -``` - -### Optimization Guidelines - -**Memory Usage:** -- Monitor GoToSocial memory consumption during runs -- Each 100 additional posts ≈ ~2-5MB additional RAM -- Recommended: 1GB+ RAM for aggressive configurations - -**Processing Time:** -- Scales linearly with `MAX_POSTS_PER_RUN × number_of_feeds` -- Duplicate detection becomes more important at scale -- Consider running frequency vs. content volume - -**Federation Growth:** -- Higher limits = more diverse instance discovery -- Expect 20-50+ new instances per optimized run -- Balance discovery rate with storage capacity - -### Troubleshooting High-Volume Setups - -**If processing takes too long:** -```env -MAX_POSTS_PER_RUN=50 # Reduce from 75/100 -DELAY_BETWEEN_REQUESTS=2 # Increase from 1 -``` - -**If GoToSocial uses too much memory:** -- Reduce RSS feed count temporarily -- Lower `?limit=` parameters to 50 instead of 100 -- Increase run frequency instead of volume - -**If duplicate detection is slow:** -- Storage cleanup: `docker-compose exec gts-holmirdas rm -f /app/data/processed_urls.json` -- This forces fresh state tracking (posts will be reprocessed once) - -## 🎯 Best Practices - -### Scaling Strategy - -1. **Start Conservative:** `limit=50`, `MAX_POSTS_PER_RUN=40` -2. **Monitor Performance:** Check RAM usage and processing time -3. **Scale Gradually:** Increase to `limit=75`, then `limit=100` -4. **Optimize Mix:** Use different limits per instance based on quality - -### Instance Selection - -**High-quality instances for aggressive limits:** -``` -# Tech-focused instances (good signal-to-noise ratio) -https://fosstodon.org/tags/homelab.rss?limit=100 -https://infosec.exchange/tags/security.rss?limit=100 - -# General instances (moderate limits recommended) -https://mastodon.social/tags/technology.rss?limit=50 -``` - -**Performance tip:** Specialized instances often have higher content quality at scale than general-purpose instances. - -## Configuration +## 🛠️ Configuration Essentials ### Environment Variables (.env) - ```bash -# GTS Server Configuration +# Required GTS_SERVER_URL=https://your-gts-instance.tld GTS_ACCESS_TOKEN=your_gts_access_token -# Processing Configuration +# Performance Tuning MAX_POSTS_PER_RUN=25 # Posts per feed per run DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls -LOG_LEVEL=INFO # Logging verbosity - -# RSS Configuration -RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file - -# Optional: Monitoring -HEALTHCHECK_URL=https://hc-ping.com/your-uuid +LOG_LEVEL=INFO # DEBUG for troubleshooting ``` ### RSS Feeds (rss_feeds.txt) - -``` -# Example RSS feeds - customize for your interests -# homelab -https://mastodon.social/tags/homelab.rss -https://fosstodon.org/tags/homelab.rss - -# selfhosting -https://mastodon.social/tags/selfhosting.rss -https://infosec.exchange/tags/selfhosting.rss - -# Add your preferred instances and hashtags +```bash +# Use URL parameters to scale performance +https://mastodon.social/tags/homelab.rss?limit=50 +https://fosstodon.org/tags/selfhosting.rss?limit=100 +https://infosec.exchange/tags/security.rss?limit=75 ``` -## Access Token Setup - +### GoToSocial Access Token 1. Login to your GoToSocial instance -2. Go to Settings → Applications -3. Create new application with scopes: `read`, `read:search`, `read:statuses` -4. Copy the access token to your `.env` file +2. Settings → Applications → Create new application +3. Required scopes: `read`, `read:search`, `read:statuses` +4. Copy access token to `.env` file -## Statistics Output +## 📖 Complete Documentation -``` -📊 GTS-HolMirDas Run Statistics: - ⏱️ Runtime: 0:04:14 - 📄 Total posts processed: 45 - 🌐 Current known instances: 2519 - ➕ New instances discovered: +3 - 📡 RSS feeds processed: 25 - ⚡ Posts per minute: 10.6 -``` +For detailed information, visit our **[Wiki](https://git.klein.ruhr/matthias/gts-holmirdas/wiki)**: -## Resource Requirements +- **[📋 Installation Guide](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Installation-Guide.-)** - Detailed setup, Docker configuration, deployment options +- **[📈 Performance & Scaling](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Performance-%26-Scaling)** - Optimization tables, scaling strategies, resource planning +- **[🛠️ Troubleshooting](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Troubleshooting)** - Common issues, Docker problems, debugging guide +- **[⚙️ Advanced Configuration](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Advanced-Configuration)** - Environment variables, RSS strategies, production tips +- **[📊 Monitoring & Stats](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Monitoring-%26-Stats)** - Understanding output, health monitoring, metrics +- **[❓ FAQ](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/FAQ+-+Frequently+Asked+Questions.-)** - Common questions and answers -- **Memory**: ~200-500MB depending on feed count -- **CPU**: Minimal (mostly I/O bound) -- **Storage**: <100MB for application, plus log storage -- **Network**: Depends on RSS feed count and frequency +## 🤝 Community & Support -## Deployment Options +- **[Contributing Guide](Contributing)** - Development setup and contribution guidelines *(coming soon)* +- **Issues**: [Report bugs or request features](https://git.klein.ruhr/matthias/gts-holmirdas/issues) +- **Contact**: [@matthias@me.klein.ruhr](https://me.klein.ruhr/@matthias) on the Fediverse -### Docker Compose (Recommended) -```bash -docker compose up -d -``` +## 🔗 Related Projects -### Standalone Docker -```bash -docker build -t gts-holmirdas . -docker run -d --env-file .env \ - -v ./data:/app/data \ - -v ./gts_holmirdas.py:/app/gts_holmirdas.py:ro \ - -v ./rss_feeds.txt:/app/rss_feeds.txt:ro \ - gts-holmirdas -``` +- **[FediFetcher](https://github.com/nanos/fedifetcher)** - Fetches missing replies and posts +- **[GoToSocial](https://github.com/superseriousbusiness/gotosocial)** - Lightweight ActivityPub server +- **[slurp](https://github.com/VyrCossont/slurp)** - Import posts from other instances -## Monitoring +## 📄 License -- **Logs**: `docker compose logs -f` -- **Health**: Optional Healthchecks.io integration -- **Statistics**: Built-in runtime and performance metrics -- **Resource Usage**: Docker stats or container monitoring tools +MIT License - see [LICENSE](LICENSE) file for details. -## Troubleshooting +## 🙏 Acknowledgments -### Common Issues - -- **No posts processed**: Check access token permissions and RSS feed URLs -- **Rate limiting errors**: Increase `DELAY_BETWEEN_REQUESTS` or reduce feed count -- **High memory usage**: Reduce `MAX_POSTS_PER_RUN` or feed frequency -- **Container won't start**: Verify `.env` file format and required variables - -### Debug Mode - -```bash -# Enable debug logging -echo "LOG_LEVEL=DEBUG" >> .env -docker compose restart gts-holmirdas -``` - -## Contributing - -1. Fork the repository -2. Create a feature branch -3. Make your changes -4. Test thoroughly -5. Submit a pull request - -## Related Projects - -- [FediFetcher](https://github.com/nanos/fedifetcher) - Fetches missing replies and posts -- [GoToSocial](https://github.com/superseriousbusiness/gotosocial) - Lightweight ActivityPub server -- [slurp](https://github.com/VyrCossont/slurp) - Import posts from other instances - -## License - -MIT License - see LICENSE file for details. - -## Acknowledgments - -- Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)) - the original RSS-to-ActivityPub concept +- Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://mkultra.x27.one/@aliceif) - Built for the GoToSocial community -- RSS-to-ActivityPub approach inspired by Fediverse discovery challenges +- RSS-to-ActivityPub federation approach \ No newline at end of file diff --git a/gts_holmirdas.py b/gts_holmirdas.py index 642695e..77d1eaf 100644 --- a/gts_holmirdas.py +++ b/gts_holmirdas.py @@ -46,8 +46,8 @@ class GTSHolMirDas: try: with open(rss_urls_file, 'r') as f: self.config["rss_urls"] = [ - line.strip() for line in f - if line.strip() and not line.startswith('#') + line.split('#', 1)[0].strip() for line in f + if line.strip() and not line.strip().startswith('#') ] self.logger.info(f"Loaded {len(self.config['rss_urls'])} RSS URLs from file: {rss_urls_file}") except Exception as e: