Raw Fuck Boys Scraper
Summary
A Python scraper for rawfuckboys.com with built-in scene code cache for instant lookups. Extracts scene codes from filenames (e.g., rfb0018) and resolves them to full scene metadata.
Source URL
Features
- Scene scraping by URL - Scrape any rawfuckboys.com video URL
- Scene scraping by code - Extract scene codes from filenames (e.g.,
rfb0018_720p.mp4) with instant cache lookup - Scene scraping by name - Search scenes by title via sitemap matching
- Performer scraping by URL - Basic performer details
- Built-in code cache - Pre-built mapping of 202 scene codes to URLs (rfb0001-rfb9083)
- Auto-refresh cache - Automatically rebuilds the cache weekly from the sitemap
- Fast lookups - No API calls needed for code-based scraping
What It Does
The Raw Fuck Boys scraper uses a local cache file to instantly resolve scene codes to URLs. Since rawfuckboys.com has no search API, the scraper:
- Maintains a cached mapping of scene codes (like
rfb0018) to video URLs - Extracts scene codes from filenames automatically
- Looks up the scene URL in the cache and scrapes the full metadata
- Rebuilds the cache weekly to keep it up-to-date
Example filename:
rfb0023_720p.mp4
Extracts code rfb0023 and fetches:
- Title: Scene title from page
- Studio: Raw Fuck Boys
- Code: rfb0023
- Description: Scene description
- Performers: Full cast with images
- Cover: High-quality poster image
Installation (Docker)
Files required:
RawFuckBoys.yml(attached)RawFuckBoys.py(attached)code_cache.json(attached)
Folder Structure
docker/
└── docker-compose.yml
└── scrapers/
└── RawFuckBoys/
├── RawFuckBoys.yml
├── RawFuckBoys.py
└── code_cache.json
Docker Compose Configuration
Add the following under volumes in your docker-compose.yml:
- ./scrapers/RawFuckBoys:/root/.stash/scrapers/RawFuckBoys
Restart the Stash container:
docker compose up -d
The scraper will appear in Stash’s scraper list.
Usage
Scraping by URL
- Edit a scene in Stash
- Enter a rawfuckboys.com URL in the URL field
- Click “Scrape with… > RawFuckBoys”
Scraping by Scene Code (from filename)
- Your file is named something like
rfb0023_720p.mp4 - Click “Scrape with… > RawFuckBoys”
- The scraper extracts
rfb0023, looks it up in the cache, and fetches the scene
Scraping by Search
- Edit a scene in Stash
- Click the search icon next to “Scrape with…”
- Search for keywords from the scene title
- Select the correct scene from results
Technical Details
- Language: Python 3
- Dependencies: requests, beautifulsoup4 (auto-installed via py_common)
- Platform: Barebackplus HTML platform (not CarnalPlus)
- Cache file:
code_cache.json- maps scene codes to URLs - Cache refresh: Automatically rebuilds weekly by fetching the sitemap
- Scene code pattern:
rfb+ 3-5 digits (e.g.,rfb0018,rfb9083) - Total scenes: 202 videos cached
Cache Management
The scraper automatically manages the cache:
- Loads
code_cache.jsonon startup - If a scene code isn’t found, forces a cache rebuild
- Rebuilds the cache weekly (7 days since last modification)
- Fetches the sitemap and scrapes each page’s
og:imagetag to extract scene codes
To manually rebuild the cache:
docker exec stash python /root/.stash/scrapers/RawFuckBoys/RawFuckBoys.py rebuildCache
Notes
- Raw Fuck Boys has no search API, so the cache is essential for code-based lookups
- The cache includes codes from rfb0001 to rfb9083 (with gaps in numbering)
- Performer images are extracted from lazy-loaded
data-srcattributes - The scraper automatically converts scene codes to lowercase for consistency
Attachments:
Important note! Rename code_cache.json.txt to code_cache.json after downloading and before running. Discourse does not support uploads of json.
- code_cache.json.txt (15.2 KB)
- RawFuckBoys.py (12.2 KB)
- RawFuckBoys.yml (643 Bytes)
Enjoy!