gts-holmirdas/README.md

305 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GTS-HolMirDas 🚀
RSS-based content discovery for **[GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial)** instances.
Automatically discovers and federates content from RSS feeds across the Fediverse, helping small GoToSocial instances populate their federated timeline without relying on traditional relays.
*Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) for Misskey by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)), this GoToSocial adaptation extends the RSS-to-ActivityPub concept with enhanced Docker deployment and multi-instance processing.*
## Features
- 📡 **Multi-Instance RSS Discovery** - Fetches content from configurable RSS feeds across Fediverse instances
-**Efficient Processing** - Configurable rate limiting and duplicate detection
- 🔧 **Production Ready** - Environment-based config, Docker deployment, health monitoring
- 📊 **Comprehensive Statistics** - Runtime metrics, content processing, and federation growth tracking
- 🐳 **Containerized** - Simple Docker Compose deployment
- 📁 **File-based Configuration** - Easy RSS feed management via text files
## How it Works
**GTS-HolMirDas** reads RSS feeds from various Fediverse instances and uses GoToSocial's search API to federate the discovered content. This approach:
- Maintains proper ActivityPub federation (posts remain interactive)
- Respects rate limits and instance policies
- Provides better content discovery for small instances
- Works alongside tools like FediFetcher for comprehensive federation
## Quick Start
```bash
# Clone the repository
git clone https://git.klein.ruhr/matthias/gts-holmirdas
cd gts-holmirdas
# Copy configuration templates
cp .env.example .env
cp rss_feeds.example.txt rss_feeds.txt
# Edit configuration
nano .env # Add your GTS credentials
nano rss_feeds.txt # Customize RSS feeds
# Deploy
docker compose up -d
# Monitor
docker compose logs -f
```
# Performance Scaling & Configuration
## 🚀 RSS Feed Optimization (v1.1.0+)
GTS-HolMirDas supports URL parameters to dramatically increase content discovery without additional API calls.
### RSS Feed Limits
Most Mastodon-compatible instances support the `?limit=X` parameter:
```
# Default behavior (20 posts per feed)
https://mastodon.social/tags/homelab.rss
# Increased limits (up to 100 posts per feed)
https://mastodon.social/tags/homelab.rss?limit=50
https://fosstodon.org/tags/docker.rss?limit=100
```
**Supported limits:** 20 (default), 50, 75, 100 (instance-dependent)
### Performance Impact
| Configuration | Posts/Run | API Calls | Processing Time |
|---------------|-----------|-----------|-----------------|
| Standard (limit=20) | ~100 posts | 30+ feeds | 2-5 minutes |
| Optimized (limit=50) | ~300 posts | 30+ feeds | 5-10 minutes |
| Maximum (limit=100) | ~600 posts | 30+ feeds | 8-15 minutes |
## ⚙️ Configuration Tuning
### Environment Variables
```env
# Processing Configuration
MAX_POSTS_PER_RUN=75 # Increase for higher limits
DELAY_BETWEEN_REQUESTS=1 # Balance speed vs. server load
RSS_URLS_FILE=/app/rss_feeds.txt
# Recommended combinations:
# Conservative: MAX_POSTS_PER_RUN=40, limit=50
# Balanced: MAX_POSTS_PER_RUN=75, limit=100
# Aggressive: MAX_POSTS_PER_RUN=100, limit=100
```
### RSS Feed Strategy
```
# Progressive scaling approach:
# 1. Start with mixed limits to test performance
# 2. Increase gradually based on server capacity
# 3. Monitor GoToSocial memory usage
# Example progression:
https://mastodon.social/tags/homelab.rss?limit=50
https://fosstodon.org/tags/selfhosting.rss?limit=75
https://chaos.social/tags/docker.rss?limit=100
```
## 📊 Monitoring & Optimization
### Performance Metrics
The statistics output shows real-time performance:
```
📊 GTS-HolMirDas Run Statistics:
⏱️ Runtime: 0:08:42
📄 Total posts processed: 487
🌐 Current known instances: 3150
New instances discovered: +45
📡 RSS feeds processed: 102
⚡ Posts per minute: 56.0
```
### Optimization Guidelines
**Memory Usage:**
- Monitor GoToSocial memory consumption during runs
- Each 100 additional posts ≈ ~2-5MB additional RAM
- Recommended: 1GB+ RAM for aggressive configurations
**Processing Time:**
- Scales linearly with `MAX_POSTS_PER_RUN × number_of_feeds`
- Duplicate detection becomes more important at scale
- Consider running frequency vs. content volume
**Federation Growth:**
- Higher limits = more diverse instance discovery
- Expect 20-50+ new instances per optimized run
- Balance discovery rate with storage capacity
### Troubleshooting High-Volume Setups
**If processing takes too long:**
```env
MAX_POSTS_PER_RUN=50 # Reduce from 75/100
DELAY_BETWEEN_REQUESTS=2 # Increase from 1
```
**If GoToSocial uses too much memory:**
- Reduce RSS feed count temporarily
- Lower `?limit=` parameters to 50 instead of 100
- Increase run frequency instead of volume
**If duplicate detection is slow:**
- Storage cleanup: `docker-compose exec gts-holmirdas rm -f /app/data/processed_urls.json`
- This forces fresh state tracking (posts will be reprocessed once)
## 🎯 Best Practices
### Scaling Strategy
1. **Start Conservative:** `limit=50`, `MAX_POSTS_PER_RUN=40`
2. **Monitor Performance:** Check RAM usage and processing time
3. **Scale Gradually:** Increase to `limit=75`, then `limit=100`
4. **Optimize Mix:** Use different limits per instance based on quality
### Instance Selection
**High-quality instances for aggressive limits:**
```
# Tech-focused instances (good signal-to-noise ratio)
https://fosstodon.org/tags/homelab.rss?limit=100
https://infosec.exchange/tags/security.rss?limit=100
# General instances (moderate limits recommended)
https://mastodon.social/tags/technology.rss?limit=50
```
**Performance tip:** Specialized instances often have higher content quality at scale than general-purpose instances.
## Configuration
### Environment Variables (.env)
```bash
# GTS Server Configuration
GTS_SERVER_URL=https://your-gts-instance.tld
GTS_ACCESS_TOKEN=your_gts_access_token
# Processing Configuration
MAX_POSTS_PER_RUN=25 # Posts per feed per run
DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls
LOG_LEVEL=INFO # Logging verbosity
# RSS Configuration
RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file
# Optional: Monitoring
HEALTHCHECK_URL=https://hc-ping.com/your-uuid
```
### RSS Feeds (rss_feeds.txt)
```
# Example RSS feeds - customize for your interests
# homelab
https://mastodon.social/tags/homelab.rss
https://fosstodon.org/tags/homelab.rss
# selfhosting
https://mastodon.social/tags/selfhosting.rss
https://infosec.exchange/tags/selfhosting.rss
# Add your preferred instances and hashtags
```
## Access Token Setup
1. Login to your GoToSocial instance
2. Go to Settings → Applications
3. Create new application with scopes: `read`, `read:search`, `read:statuses`
4. Copy the access token to your `.env` file
## Statistics Output
```
📊 GTS-HolMirDas Run Statistics:
⏱️ Runtime: 0:04:14
📄 Total posts processed: 45
🌐 Current known instances: 2519
New instances discovered: +3
📡 RSS feeds processed: 25
⚡ Posts per minute: 10.6
```
## Resource Requirements
- **Memory**: ~200-500MB depending on feed count
- **CPU**: Minimal (mostly I/O bound)
- **Storage**: <100MB for application, plus log storage
- **Network**: Depends on RSS feed count and frequency
## Deployment Options
### Docker Compose (Recommended)
```bash
docker compose up -d
```
### Standalone Docker
```bash
docker build -t gts-holmirdas .
docker run -d --env-file .env \
-v ./data:/app/data \
-v ./gts_holmirdas.py:/app/gts_holmirdas.py:ro \
-v ./rss_feeds.txt:/app/rss_feeds.txt:ro \
gts-holmirdas
```
## Monitoring
- **Logs**: `docker compose logs -f`
- **Health**: Optional Healthchecks.io integration
- **Statistics**: Built-in runtime and performance metrics
- **Resource Usage**: Docker stats or container monitoring tools
## Troubleshooting
### Common Issues
- **No posts processed**: Check access token permissions and RSS feed URLs
- **Rate limiting errors**: Increase `DELAY_BETWEEN_REQUESTS` or reduce feed count
- **High memory usage**: Reduce `MAX_POSTS_PER_RUN` or feed frequency
- **Container won't start**: Verify `.env` file format and required variables
### Debug Mode
```bash
# Enable debug logging
echo "LOG_LEVEL=DEBUG" >> .env
docker compose restart gts-holmirdas
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## Related Projects
- [FediFetcher](https://github.com/nanos/fedifetcher) - Fetches missing replies and posts
- [GoToSocial](https://github.com/superseriousbusiness/gotosocial) - Lightweight ActivityPub server
- [slurp](https://github.com/VyrCossont/slurp) - Import posts from other instances
## License
MIT License - see LICENSE file for details.
## Acknowledgments
- Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)) - the original RSS-to-ActivityPub concept
- Built for the GoToSocial community
- RSS-to-ActivityPub approach inspired by Fediverse discovery challenges