Compare commits

..

No commits in common. "master" and "v1.1.0" have entirely different histories.

2 changed files with 250 additions and 57 deletions

299
README.md
View file

@ -1,24 +1,34 @@
# GTS-HolMirDas 🚀 # GTS-HolMirDas 🚀
RSS-based content discovery for [GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial) instances. RSS-based content discovery for **[GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial)** instances.
Automatically discovers and federates content from RSS feeds across the Fediverse, helping small GoToSocial instances populate their federated timeline without relying on traditional relays. Automatically discovers and federates content from RSS feeds across the Fediverse, helping small GoToSocial instances populate their federated timeline without relying on traditional relays.
Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://mkultra.x27.one/@aliceif), adapted for GoToSocial with enhanced Docker deployment and multi-instance processing. *Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) for Misskey by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)), this GoToSocial adaptation extends the RSS-to-ActivityPub concept with enhanced Docker deployment and multi-instance processing.*
## ✨ Key Features ## Features
- **📡 Multi-Instance Discovery** - Fetches content from configurable RSS feeds across Fediverse instances - 📡 **Multi-Instance RSS Discovery** - Fetches content from configurable RSS feeds across Fediverse instances
- **⚡ Performance Scaling** - 20-100 posts per feed with URL parameters (`?limit=100`) - ⚡ **Efficient Processing** - Configurable rate limiting and duplicate detection
- **🐳 Production Ready** - Docker deployment, environment-based config, health monitoring - 🔧 **Production Ready** - Environment-based config, Docker deployment, health monitoring
- **📊 Comprehensive Stats** - Runtime metrics, federation growth, performance tracking - 📊 **Comprehensive Statistics** - Runtime metrics, content processing, and federation growth tracking
- **🔧 Zero Maintenance** - Runs automatically every hour with duplicate detection - 🐳 **Containerized** - Simple Docker Compose deployment
- 📁 **File-based Configuration** - Easy RSS feed management via text files
## 🚀 Quick Start ## How it Works
**GTS-HolMirDas** reads RSS feeds from various Fediverse instances and uses GoToSocial's search API to federate the discovered content. This approach:
- Maintains proper ActivityPub federation (posts remain interactive)
- Respects rate limits and instance policies
- Provides better content discovery for small instances
- Works alongside tools like FediFetcher for comprehensive federation
## Quick Start
```bash ```bash
# Clone the repository # Clone the repository
git clone https://git.klein.ruhr/matthias/gts-holmirdas git clone https://git.klein.ruhr/user/gts-holmirdas
cd gts-holmirdas cd gts-holmirdas
# Copy configuration templates # Copy configuration templates
@ -35,78 +45,261 @@ docker compose up -d
# Monitor # Monitor
docker compose logs -f docker compose logs -f
``` ```
# Performance Scaling & Configuration
## 📈 Performance at Scale ## 🚀 RSS Feed Optimization (v1.1.0+)
GTS-HolMirDas supports URL parameters to dramatically increase content discovery without additional API calls.
### RSS Feed Limits
Most Mastodon-compatible instances support the `?limit=X` parameter:
**Real Production Data:**
``` ```
📊 Runtime: 8:42 | 487 posts processed | 3,150+ instances discovered # Default behavior (20 posts per feed)
⚡ 56 posts/minute | 102 RSS feeds | +45 new instances per run https://mastodon.social/tags/homelab.rss
💾 Resource usage: ~450MB RAM total (GoToSocial + tools)
# Increased limits (up to 100 posts per feed)
https://mastodon.social/tags/homelab.rss?limit=50
https://fosstodon.org/tags/docker.rss?limit=100
``` ```
**Scaling Options:** **Supported limits:** 20 (default), 50, 75, 100 (instance-dependent)
- **Conservative:** 20 posts/feed (~100 posts/run)
- **Balanced:** 50 posts/feed (~300 posts/run)
- **Aggressive:** 100 posts/feed (~600 posts/run)
## 🛠️ Configuration Essentials ### Performance Impact
| Configuration | Posts/Run | API Calls | Processing Time |
|---------------|-----------|-----------|-----------------|
| Standard (limit=20) | ~100 posts | 30+ feeds | 2-5 minutes |
| Optimized (limit=50) | ~300 posts | 30+ feeds | 5-10 minutes |
| Maximum (limit=100) | ~600 posts | 30+ feeds | 8-15 minutes |
## ⚙️ Configuration Tuning
### Environment Variables
```env
# Processing Configuration
MAX_POSTS_PER_RUN=75 # Increase for higher limits
DELAY_BETWEEN_REQUESTS=1 # Balance speed vs. server load
RSS_URLS_FILE=/app/rss_feeds.txt
# Recommended combinations:
# Conservative: MAX_POSTS_PER_RUN=40, limit=50
# Balanced: MAX_POSTS_PER_RUN=75, limit=100
# Aggressive: MAX_POSTS_PER_RUN=100, limit=100
```
### RSS Feed Strategy
```
# Progressive scaling approach:
# 1. Start with mixed limits to test performance
# 2. Increase gradually based on server capacity
# 3. Monitor GoToSocial memory usage
# Example progression:
https://mastodon.social/tags/homelab.rss?limit=50
https://fosstodon.org/tags/selfhosting.rss?limit=75
https://chaos.social/tags/docker.rss?limit=100
```
## 📊 Monitoring & Optimization
### Performance Metrics
The statistics output shows real-time performance:
```
📊 GTS-HolMirDas Run Statistics:
⏱️ Runtime: 0:08:42
📄 Total posts processed: 487
🌐 Current known instances: 3150
New instances discovered: +45
📡 RSS feeds processed: 102
⚡ Posts per minute: 56.0
```
### Optimization Guidelines
**Memory Usage:**
- Monitor GoToSocial memory consumption during runs
- Each 100 additional posts ≈ ~2-5MB additional RAM
- Recommended: 1GB+ RAM for aggressive configurations
**Processing Time:**
- Scales linearly with `MAX_POSTS_PER_RUN × number_of_feeds`
- Duplicate detection becomes more important at scale
- Consider running frequency vs. content volume
**Federation Growth:**
- Higher limits = more diverse instance discovery
- Expect 20-50+ new instances per optimized run
- Balance discovery rate with storage capacity
### Troubleshooting High-Volume Setups
**If processing takes too long:**
```env
MAX_POSTS_PER_RUN=50 # Reduce from 75/100
DELAY_BETWEEN_REQUESTS=2 # Increase from 1
```
**If GoToSocial uses too much memory:**
- Reduce RSS feed count temporarily
- Lower `?limit=` parameters to 50 instead of 100
- Increase run frequency instead of volume
**If duplicate detection is slow:**
- Storage cleanup: `docker-compose exec gts-holmirdas rm -f /app/data/processed_urls.json`
- This forces fresh state tracking (posts will be reprocessed once)
## 🎯 Best Practices
### Scaling Strategy
1. **Start Conservative:** `limit=50`, `MAX_POSTS_PER_RUN=40`
2. **Monitor Performance:** Check RAM usage and processing time
3. **Scale Gradually:** Increase to `limit=75`, then `limit=100`
4. **Optimize Mix:** Use different limits per instance based on quality
### Instance Selection
**High-quality instances for aggressive limits:**
```
# Tech-focused instances (good signal-to-noise ratio)
https://fosstodon.org/tags/homelab.rss?limit=100
https://infosec.exchange/tags/security.rss?limit=100
# General instances (moderate limits recommended)
https://mastodon.social/tags/technology.rss?limit=50
```
**Performance tip:** Specialized instances often have higher content quality at scale than general-purpose instances.
## Configuration
### Environment Variables (.env) ### Environment Variables (.env)
```bash ```bash
# Required # GTS Server Configuration
GTS_SERVER_URL=https://your-gts-instance.tld GTS_SERVER_URL=https://your-gts-instance.tld
GTS_ACCESS_TOKEN=your_gts_access_token GTS_ACCESS_TOKEN=your_gts_access_token
# Performance Tuning # Processing Configuration
MAX_POSTS_PER_RUN=25 # Posts per feed per run MAX_POSTS_PER_RUN=25 # Posts per feed per run
DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls
LOG_LEVEL=INFO # DEBUG for troubleshooting LOG_LEVEL=INFO # Logging verbosity
# RSS Configuration
RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file
# Optional: Monitoring
HEALTHCHECK_URL=https://hc-ping.com/your-uuid
``` ```
### RSS Feeds (rss_feeds.txt) ### RSS Feeds (rss_feeds.txt)
```bash
# Use URL parameters to scale performance ```
https://mastodon.social/tags/homelab.rss?limit=50 # Example RSS feeds - customize for your interests
https://fosstodon.org/tags/selfhosting.rss?limit=100 # homelab
https://infosec.exchange/tags/security.rss?limit=75 https://mastodon.social/tags/homelab.rss
https://fosstodon.org/tags/homelab.rss
# selfhosting
https://mastodon.social/tags/selfhosting.rss
https://infosec.exchange/tags/selfhosting.rss
# Add your preferred instances and hashtags
``` ```
### GoToSocial Access Token ## Access Token Setup
1. Login to your GoToSocial instance 1. Login to your GoToSocial instance
2. Settings → Applications → Create new application 2. Go to Settings → Applications
3. Required scopes: `read`, `read:search`, `read:statuses` 3. Create new application with scopes: `read`, `read:search`, `read:statuses`
4. Copy access token to `.env` file 4. Copy the access token to your `.env` file
## 📖 Complete Documentation ## Statistics Output
For detailed information, visit our **[Wiki](https://git.klein.ruhr/matthias/gts-holmirdas/wiki)**: ```
📊 GTS-HolMirDas Run Statistics:
⏱️ Runtime: 0:04:14
📄 Total posts processed: 45
🌐 Current known instances: 2519
New instances discovered: +3
📡 RSS feeds processed: 25
⚡ Posts per minute: 10.6
```
- **[📋 Installation Guide](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Installation-Guide.-)** - Detailed setup, Docker configuration, deployment options ## Resource Requirements
- **[📈 Performance & Scaling](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Performance-%26-Scaling)** - Optimization tables, scaling strategies, resource planning
- **[🛠️ Troubleshooting](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Troubleshooting)** - Common issues, Docker problems, debugging guide
- **[⚙️ Advanced Configuration](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Advanced-Configuration)** - Environment variables, RSS strategies, production tips
- **[📊 Monitoring & Stats](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Monitoring-%26-Stats)** - Understanding output, health monitoring, metrics
- **[❓ FAQ](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/FAQ+-+Frequently+Asked+Questions.-)** - Common questions and answers
## 🤝 Community & Support - **Memory**: ~200-500MB depending on feed count
- **CPU**: Minimal (mostly I/O bound)
- **Storage**: <100MB for application, plus log storage
- **Network**: Depends on RSS feed count and frequency
- **[Contributing Guide](Contributing)** - Development setup and contribution guidelines *(coming soon)* ## Deployment Options
- **Issues**: [Report bugs or request features](https://git.klein.ruhr/matthias/gts-holmirdas/issues)
- **Contact**: [@matthias@me.klein.ruhr](https://me.klein.ruhr/@matthias) on the Fediverse
## 🔗 Related Projects ### Docker Compose (Recommended)
```bash
docker compose up -d
```
- **[FediFetcher](https://github.com/nanos/fedifetcher)** - Fetches missing replies and posts ### Standalone Docker
- **[GoToSocial](https://github.com/superseriousbusiness/gotosocial)** - Lightweight ActivityPub server ```bash
- **[slurp](https://github.com/VyrCossont/slurp)** - Import posts from other instances docker build -t gts-holmirdas .
docker run -d --env-file .env \
-v ./data:/app/data \
-v ./gts_holmirdas.py:/app/gts_holmirdas.py:ro \
-v ./rss_feeds.txt:/app/rss_feeds.txt:ro \
gts-holmirdas
```
## 📄 License ## Monitoring
MIT License - see [LICENSE](LICENSE) file for details. - **Logs**: `docker compose logs -f`
- **Health**: Optional Healthchecks.io integration
- **Statistics**: Built-in runtime and performance metrics
- **Resource Usage**: Docker stats or container monitoring tools
## 🙏 Acknowledgments ## Troubleshooting
- Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://mkultra.x27.one/@aliceif) ### Common Issues
- **No posts processed**: Check access token permissions and RSS feed URLs
- **Rate limiting errors**: Increase `DELAY_BETWEEN_REQUESTS` or reduce feed count
- **High memory usage**: Reduce `MAX_POSTS_PER_RUN` or feed frequency
- **Container won't start**: Verify `.env` file format and required variables
### Debug Mode
```bash
# Enable debug logging
echo "LOG_LEVEL=DEBUG" >> .env
docker compose restart gts-holmirdas
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## Related Projects
- [FediFetcher](https://github.com/nanos/fedifetcher) - Fetches missing replies and posts
- [GoToSocial](https://github.com/superseriousbusiness/gotosocial) - Lightweight ActivityPub server
- [slurp](https://github.com/VyrCossont/slurp) - Import posts from other instances
## License
MIT License - see LICENSE file for details.
## Acknowledgments
- Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)) - the original RSS-to-ActivityPub concept
- Built for the GoToSocial community - Built for the GoToSocial community
- RSS-to-ActivityPub federation approach - RSS-to-ActivityPub approach inspired by Fediverse discovery challenges

View file

@ -46,8 +46,8 @@ class GTSHolMirDas:
try: try:
with open(rss_urls_file, 'r') as f: with open(rss_urls_file, 'r') as f:
self.config["rss_urls"] = [ self.config["rss_urls"] = [
line.split('#', 1)[0].strip() for line in f line.strip() for line in f
if line.strip() and not line.strip().startswith('#') if line.strip() and not line.startswith('#')
] ]
self.logger.info(f"Loaded {len(self.config['rss_urls'])} RSS URLs from file: {rss_urls_file}") self.logger.info(f"Loaded {len(self.config['rss_urls'])} RSS URLs from file: {rss_urls_file}")
except Exception as e: except Exception as e: