Compare commits

..

11 commits

Author SHA1 Message Date
matthias
d2601cd83f Fix inline comment parsing in RSS feeds file
- Fixed control character errors when using inline comments in rss_feeds.txt
- Comments after # are now properly stripped from RSS URLs
- Minimal fix using split('#', 1)[0].strip() approach
2025-08-04 11:19:38 +02:00
matthias
2dd7e9162e Merge branch 'master' of https://git.klein.ruhr/matthias/gts-holmirdas 2025-08-04 10:16:13 +02:00
matthias
750b425e33 Fix environment variable support for GTS_SERVER_URL 2025-08-04 10:15:06 +02:00
0cbac9b31c Streamline README, move detailed docs to Wiki 2025-08-03 21:46:26 +00:00
c8dabb5c0e Streamline README, move detailed docs to Wiki 2025-08-03 21:45:58 +00:00
298d11fc44 Streamline README, move detailed docs to Wiki 2025-08-03 21:44:03 +00:00
8c000eea02 Streamline README, move detailed docs to Wiki 2025-08-03 21:43:23 +00:00
228e3c8d51 Streamline README, move detailed docs to Wiki 2025-08-03 20:49:42 +00:00
c9545735ea Streamline README, move detailed docs to Wiki 2025-08-03 20:47:38 +00:00
4bd1d05d93 Streamline README, move detailed docs to Wiki 2025-08-03 20:46:12 +00:00
798433af07 Streamline README, move detailed docs to Wiki 2025-08-03 20:20:01 +00:00
2 changed files with 56 additions and 249 deletions

297
README.md
View file

@ -1,30 +1,20 @@
# GTS-HolMirDas 🚀 # GTS-HolMirDas 🚀
RSS-based content discovery for **[GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial)** instances. RSS-based content discovery for [GoToSocial](https://codeberg.org/superseriousbusiness/gotosocial) instances.
Automatically discovers and federates content from RSS feeds across the Fediverse, helping small GoToSocial instances populate their federated timeline without relying on traditional relays. Automatically discovers and federates content from RSS feeds across the Fediverse, helping small GoToSocial instances populate their federated timeline without relying on traditional relays.
*Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) for Misskey by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)), this GoToSocial adaptation extends the RSS-to-ActivityPub concept with enhanced Docker deployment and multi-instance processing.* Inspired by the original [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://mkultra.x27.one/@aliceif), adapted for GoToSocial with enhanced Docker deployment and multi-instance processing.
## Features ## ✨ Key Features
- 📡 **Multi-Instance RSS Discovery** - Fetches content from configurable RSS feeds across Fediverse instances - **📡 Multi-Instance Discovery** - Fetches content from configurable RSS feeds across Fediverse instances
- ⚡ **Efficient Processing** - Configurable rate limiting and duplicate detection - **⚡ Performance Scaling** - 20-100 posts per feed with URL parameters (`?limit=100`)
- 🔧 **Production Ready** - Environment-based config, Docker deployment, health monitoring - **🐳 Production Ready** - Docker deployment, environment-based config, health monitoring
- 📊 **Comprehensive Statistics** - Runtime metrics, content processing, and federation growth tracking - **📊 Comprehensive Stats** - Runtime metrics, federation growth, performance tracking
- 🐳 **Containerized** - Simple Docker Compose deployment - **🔧 Zero Maintenance** - Runs automatically every hour with duplicate detection
- 📁 **File-based Configuration** - Easy RSS feed management via text files
## How it Works ## 🚀 Quick Start
**GTS-HolMirDas** reads RSS feeds from various Fediverse instances and uses GoToSocial's search API to federate the discovered content. This approach:
- Maintains proper ActivityPub federation (posts remain interactive)
- Respects rate limits and instance policies
- Provides better content discovery for small instances
- Works alongside tools like FediFetcher for comprehensive federation
## Quick Start
```bash ```bash
# Clone the repository # Clone the repository
@ -45,261 +35,78 @@ docker compose up -d
# Monitor # Monitor
docker compose logs -f docker compose logs -f
``` ```
# Performance Scaling & Configuration
## 🚀 RSS Feed Optimization (v1.1.0+) ## 📈 Performance at Scale
GTS-HolMirDas supports URL parameters to dramatically increase content discovery without additional API calls.
### RSS Feed Limits
Most Mastodon-compatible instances support the `?limit=X` parameter:
**Real Production Data:**
``` ```
# Default behavior (20 posts per feed) 📊 Runtime: 8:42 | 487 posts processed | 3,150+ instances discovered
https://mastodon.social/tags/homelab.rss ⚡ 56 posts/minute | 102 RSS feeds | +45 new instances per run
💾 Resource usage: ~450MB RAM total (GoToSocial + tools)
# Increased limits (up to 100 posts per feed)
https://mastodon.social/tags/homelab.rss?limit=50
https://fosstodon.org/tags/docker.rss?limit=100
``` ```
**Supported limits:** 20 (default), 50, 75, 100 (instance-dependent) **Scaling Options:**
- **Conservative:** 20 posts/feed (~100 posts/run)
- **Balanced:** 50 posts/feed (~300 posts/run)
- **Aggressive:** 100 posts/feed (~600 posts/run)
### Performance Impact ## 🛠️ Configuration Essentials
| Configuration | Posts/Run | API Calls | Processing Time |
|---------------|-----------|-----------|-----------------|
| Standard (limit=20) | ~100 posts | 30+ feeds | 2-5 minutes |
| Optimized (limit=50) | ~300 posts | 30+ feeds | 5-10 minutes |
| Maximum (limit=100) | ~600 posts | 30+ feeds | 8-15 minutes |
## ⚙️ Configuration Tuning
### Environment Variables
```env
# Processing Configuration
MAX_POSTS_PER_RUN=75 # Increase for higher limits
DELAY_BETWEEN_REQUESTS=1 # Balance speed vs. server load
RSS_URLS_FILE=/app/rss_feeds.txt
# Recommended combinations:
# Conservative: MAX_POSTS_PER_RUN=40, limit=50
# Balanced: MAX_POSTS_PER_RUN=75, limit=100
# Aggressive: MAX_POSTS_PER_RUN=100, limit=100
```
### RSS Feed Strategy
```
# Progressive scaling approach:
# 1. Start with mixed limits to test performance
# 2. Increase gradually based on server capacity
# 3. Monitor GoToSocial memory usage
# Example progression:
https://mastodon.social/tags/homelab.rss?limit=50
https://fosstodon.org/tags/selfhosting.rss?limit=75
https://chaos.social/tags/docker.rss?limit=100
```
## 📊 Monitoring & Optimization
### Performance Metrics
The statistics output shows real-time performance:
```
📊 GTS-HolMirDas Run Statistics:
⏱️ Runtime: 0:08:42
📄 Total posts processed: 487
🌐 Current known instances: 3150
New instances discovered: +45
📡 RSS feeds processed: 102
⚡ Posts per minute: 56.0
```
### Optimization Guidelines
**Memory Usage:**
- Monitor GoToSocial memory consumption during runs
- Each 100 additional posts ≈ ~2-5MB additional RAM
- Recommended: 1GB+ RAM for aggressive configurations
**Processing Time:**
- Scales linearly with `MAX_POSTS_PER_RUN × number_of_feeds`
- Duplicate detection becomes more important at scale
- Consider running frequency vs. content volume
**Federation Growth:**
- Higher limits = more diverse instance discovery
- Expect 20-50+ new instances per optimized run
- Balance discovery rate with storage capacity
### Troubleshooting High-Volume Setups
**If processing takes too long:**
```env
MAX_POSTS_PER_RUN=50 # Reduce from 75/100
DELAY_BETWEEN_REQUESTS=2 # Increase from 1
```
**If GoToSocial uses too much memory:**
- Reduce RSS feed count temporarily
- Lower `?limit=` parameters to 50 instead of 100
- Increase run frequency instead of volume
**If duplicate detection is slow:**
- Storage cleanup: `docker-compose exec gts-holmirdas rm -f /app/data/processed_urls.json`
- This forces fresh state tracking (posts will be reprocessed once)
## 🎯 Best Practices
### Scaling Strategy
1. **Start Conservative:** `limit=50`, `MAX_POSTS_PER_RUN=40`
2. **Monitor Performance:** Check RAM usage and processing time
3. **Scale Gradually:** Increase to `limit=75`, then `limit=100`
4. **Optimize Mix:** Use different limits per instance based on quality
### Instance Selection
**High-quality instances for aggressive limits:**
```
# Tech-focused instances (good signal-to-noise ratio)
https://fosstodon.org/tags/homelab.rss?limit=100
https://infosec.exchange/tags/security.rss?limit=100
# General instances (moderate limits recommended)
https://mastodon.social/tags/technology.rss?limit=50
```
**Performance tip:** Specialized instances often have higher content quality at scale than general-purpose instances.
## Configuration
### Environment Variables (.env) ### Environment Variables (.env)
```bash ```bash
# GTS Server Configuration # Required
GTS_SERVER_URL=https://your-gts-instance.tld GTS_SERVER_URL=https://your-gts-instance.tld
GTS_ACCESS_TOKEN=your_gts_access_token GTS_ACCESS_TOKEN=your_gts_access_token
# Processing Configuration # Performance Tuning
MAX_POSTS_PER_RUN=25 # Posts per feed per run MAX_POSTS_PER_RUN=25 # Posts per feed per run
DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls DELAY_BETWEEN_REQUESTS=1 # Seconds between API calls
LOG_LEVEL=INFO # Logging verbosity LOG_LEVEL=INFO # DEBUG for troubleshooting
# RSS Configuration
RSS_URLS_FILE=/app/rss_feeds.txt # Path to RSS feeds file
# Optional: Monitoring
HEALTHCHECK_URL=https://hc-ping.com/your-uuid
``` ```
### RSS Feeds (rss_feeds.txt) ### RSS Feeds (rss_feeds.txt)
```bash
``` # Use URL parameters to scale performance
# Example RSS feeds - customize for your interests https://mastodon.social/tags/homelab.rss?limit=50
# homelab https://fosstodon.org/tags/selfhosting.rss?limit=100
https://mastodon.social/tags/homelab.rss https://infosec.exchange/tags/security.rss?limit=75
https://fosstodon.org/tags/homelab.rss
# selfhosting
https://mastodon.social/tags/selfhosting.rss
https://infosec.exchange/tags/selfhosting.rss
# Add your preferred instances and hashtags
``` ```
## Access Token Setup ### GoToSocial Access Token
1. Login to your GoToSocial instance 1. Login to your GoToSocial instance
2. Go to Settings → Applications 2. Settings → Applications → Create new application
3. Create new application with scopes: `read`, `read:search`, `read:statuses` 3. Required scopes: `read`, `read:search`, `read:statuses`
4. Copy the access token to your `.env` file 4. Copy access token to `.env` file
## Statistics Output ## 📖 Complete Documentation
``` For detailed information, visit our **[Wiki](https://git.klein.ruhr/matthias/gts-holmirdas/wiki)**:
📊 GTS-HolMirDas Run Statistics:
⏱️ Runtime: 0:04:14
📄 Total posts processed: 45
🌐 Current known instances: 2519
New instances discovered: +3
📡 RSS feeds processed: 25
⚡ Posts per minute: 10.6
```
## Resource Requirements - **[📋 Installation Guide](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Installation-Guide.-)** - Detailed setup, Docker configuration, deployment options
- **[📈 Performance & Scaling](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Performance-%26-Scaling)** - Optimization tables, scaling strategies, resource planning
- **[🛠️ Troubleshooting](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Troubleshooting)** - Common issues, Docker problems, debugging guide
- **[⚙️ Advanced Configuration](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Advanced-Configuration)** - Environment variables, RSS strategies, production tips
- **[📊 Monitoring & Stats](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/Monitoring-%26-Stats)** - Understanding output, health monitoring, metrics
- **[❓ FAQ](https://git.klein.ruhr/matthias/gts-holmirdas/wiki/FAQ+-+Frequently+Asked+Questions.-)** - Common questions and answers
- **Memory**: ~200-500MB depending on feed count ## 🤝 Community & Support
- **CPU**: Minimal (mostly I/O bound)
- **Storage**: <100MB for application, plus log storage
- **Network**: Depends on RSS feed count and frequency
## Deployment Options - **[Contributing Guide](Contributing)** - Development setup and contribution guidelines *(coming soon)*
- **Issues**: [Report bugs or request features](https://git.klein.ruhr/matthias/gts-holmirdas/issues)
- **Contact**: [@matthias@me.klein.ruhr](https://me.klein.ruhr/@matthias) on the Fediverse
### Docker Compose (Recommended) ## 🔗 Related Projects
```bash
docker compose up -d
```
### Standalone Docker - **[FediFetcher](https://github.com/nanos/fedifetcher)** - Fetches missing replies and posts
```bash - **[GoToSocial](https://github.com/superseriousbusiness/gotosocial)** - Lightweight ActivityPub server
docker build -t gts-holmirdas . - **[slurp](https://github.com/VyrCossont/slurp)** - Import posts from other instances
docker run -d --env-file .env \
-v ./data:/app/data \
-v ./gts_holmirdas.py:/app/gts_holmirdas.py:ro \
-v ./rss_feeds.txt:/app/rss_feeds.txt:ro \
gts-holmirdas
```
## Monitoring ## 📄 License
- **Logs**: `docker compose logs -f` MIT License - see [LICENSE](LICENSE) file for details.
- **Health**: Optional Healthchecks.io integration
- **Statistics**: Built-in runtime and performance metrics
- **Resource Usage**: Docker stats or container monitoring tools
## Troubleshooting ## 🙏 Acknowledgments
### Common Issues - Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://mkultra.x27.one/@aliceif)
- **No posts processed**: Check access token permissions and RSS feed URLs
- **Rate limiting errors**: Increase `DELAY_BETWEEN_REQUESTS` or reduce feed count
- **High memory usage**: Reduce `MAX_POSTS_PER_RUN` or feed frequency
- **Container won't start**: Verify `.env` file format and required variables
### Debug Mode
```bash
# Enable debug logging
echo "LOG_LEVEL=DEBUG" >> .env
docker compose restart gts-holmirdas
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## Related Projects
- [FediFetcher](https://github.com/nanos/fedifetcher) - Fetches missing replies and posts
- [GoToSocial](https://github.com/superseriousbusiness/gotosocial) - Lightweight ActivityPub server
- [slurp](https://github.com/VyrCossont/slurp) - Import posts from other instances
## License
MIT License - see LICENSE file for details.
## Acknowledgments
- Inspired by [HolMirDas](https://github.com/aliceif/HolMirDas) by [@aliceif](https://github.com/aliceif) ([@aliceif@mkultra.x27.one](https://mkultra.x27.one/@aliceif)) - the original RSS-to-ActivityPub concept
- Built for the GoToSocial community - Built for the GoToSocial community
- RSS-to-ActivityPub approach inspired by Fediverse discovery challenges - RSS-to-ActivityPub federation approach

View file

@ -46,8 +46,8 @@ class GTSHolMirDas:
try: try:
with open(rss_urls_file, 'r') as f: with open(rss_urls_file, 'r') as f:
self.config["rss_urls"] = [ self.config["rss_urls"] = [
line.strip() for line in f line.split('#', 1)[0].strip() for line in f
if line.strip() and not line.startswith('#') if line.strip() and not line.strip().startswith('#')
] ]
self.logger.info(f"Loaded {len(self.config['rss_urls'])} RSS URLs from file: {rss_urls_file}") self.logger.info(f"Loaded {len(self.config['rss_urls'])} RSS URLs from file: {rss_urls_file}")
except Exception as e: except Exception as e: