RawFuckBoys Scraper

Raw Fuck Boys Scraper

Summary

A Python scraper for rawfuckboys.com with built-in scene code cache for instant lookups. Extracts scene codes from filenames (e.g., rfb0018) and resolves them to full scene metadata.

Source URL

Features

  • Scene scraping by URL - Scrape any rawfuckboys.com video URL
  • Scene scraping by code - Extract scene codes from filenames (e.g., rfb0018_720p.mp4) with instant cache lookup
  • Scene scraping by name - Search scenes by title via sitemap matching
  • Performer scraping by URL - Basic performer details
  • Built-in code cache - Pre-built mapping of 202 scene codes to URLs (rfb0001-rfb9083)
  • Auto-refresh cache - Automatically rebuilds the cache weekly from the sitemap
  • Fast lookups - No API calls needed for code-based scraping

What It Does

The Raw Fuck Boys scraper uses a local cache file to instantly resolve scene codes to URLs. Since rawfuckboys.com has no search API, the scraper:

  1. Maintains a cached mapping of scene codes (like rfb0018) to video URLs
  2. Extracts scene codes from filenames automatically
  3. Looks up the scene URL in the cache and scrapes the full metadata
  4. Rebuilds the cache weekly to keep it up-to-date

Example filename:

rfb0023_720p.mp4

Extracts code rfb0023 and fetches:

  • Title: Scene title from page
  • Studio: Raw Fuck Boys
  • Code: rfb0023
  • Description: Scene description
  • Performers: Full cast with images
  • Cover: High-quality poster image

Installation (Docker)

Files required:

  • RawFuckBoys.yml (attached)
  • RawFuckBoys.py (attached)
  • code_cache.json (attached)

Folder Structure

docker/
└── docker-compose.yml
└── scrapers/
    └── RawFuckBoys/
        ├── RawFuckBoys.yml
        ├── RawFuckBoys.py
        └── code_cache.json

Docker Compose Configuration

Add the following under volumes in your docker-compose.yml:

- ./scrapers/RawFuckBoys:/root/.stash/scrapers/RawFuckBoys

Restart the Stash container:

docker compose up -d

The scraper will appear in Stash’s scraper list.

Usage

Scraping by URL

  1. Edit a scene in Stash
  2. Enter a rawfuckboys.com URL in the URL field
  3. Click “Scrape with… > RawFuckBoys”

Scraping by Scene Code (from filename)

  1. Your file is named something like rfb0023_720p.mp4
  2. Click “Scrape with… > RawFuckBoys”
  3. The scraper extracts rfb0023, looks it up in the cache, and fetches the scene

Scraping by Search

  1. Edit a scene in Stash
  2. Click the search icon next to “Scrape with…”
  3. Search for keywords from the scene title
  4. Select the correct scene from results

Technical Details

  • Language: Python 3
  • Dependencies: requests, beautifulsoup4 (auto-installed via py_common)
  • Platform: Barebackplus HTML platform (not CarnalPlus)
  • Cache file: code_cache.json - maps scene codes to URLs
  • Cache refresh: Automatically rebuilds weekly by fetching the sitemap
  • Scene code pattern: rfb + 3-5 digits (e.g., rfb0018, rfb9083)
  • Total scenes: 202 videos cached

Cache Management

The scraper automatically manages the cache:

  • Loads code_cache.json on startup
  • If a scene code isn’t found, forces a cache rebuild
  • Rebuilds the cache weekly (7 days since last modification)
  • Fetches the sitemap and scrapes each page’s og:image tag to extract scene codes

To manually rebuild the cache:

docker exec stash python /root/.stash/scrapers/RawFuckBoys/RawFuckBoys.py rebuildCache

Notes

  • Raw Fuck Boys has no search API, so the cache is essential for code-based lookups
  • The cache includes codes from rfb0001 to rfb9083 (with gaps in numbering)
  • Performer images are extracted from lazy-loaded data-src attributes
  • The scraper automatically converts scene codes to lowercase for consistency

Attachments:

Important note! Rename code_cache.json.txt to code_cache.json after downloading and before running. Discourse does not support uploads of json.

Enjoy!

Submitted to CommunityScrapers.