Twink Top Scraper
Summary
A Python scraper for twinktop.com with built-in scene code cache for instant lookups. Extracts scene codes from filenames (e.g., ttp0004) and resolves them to full scene metadata. Features automatic performer name normalization for clean formatting.
Source URL
Features
- Scene scraping by URL - Scrape any twinktop.com video URL
- Scene scraping by code - Extract scene codes from filenames (e.g.,
ttp0004_720p.mp4) with instant cache lookup - Scene scraping by name - Search scenes by title via sitemap matching
- Performer scraping by URL - Full performer details with images
- Performer name normalization - Automatically converts all-caps names to title case (e.g., “COACH SAVAGE” → “Coach Savage”)
- Built-in code cache - Pre-built mapping of 117 scene codes to URLs (ttp0001-ttp0119)
- Auto-refresh cache - Automatically rebuilds the cache weekly from the sitemap
- Fast lookups - No API calls needed for code-based scraping
What It Does
The Twink Top scraper uses a local cache file to instantly resolve scene codes to URLs. Since twinktop.com has no search API, the scraper:
- Maintains a cached mapping of scene codes (like
ttp0004) to video URLs - Extracts scene codes from filenames automatically
- Looks up the scene URL in the cache and scrapes the full metadata
- Rebuilds the cache weekly to keep it up-to-date
- Normalizes performer names from all-caps to title case for consistency
Example filename:
ttp0004_720p.mp4
Extracts code ttp0004 and fetches:
- Title: Scene title from page
- Studio: Twink Top
- Code: ttp0004
- Description: Scene description
- Performers: Full cast with images (names automatically normalized)
- Cover: High-quality poster image
Installation (Docker)
Files required:
TwinkTop.yml(attached)TwinkTop.py(attached)code_cache.json(attached)
Folder Structure
docker/
└── docker-compose.yml
└── scrapers/
└── TwinkTop/
├── TwinkTop.yml
├── TwinkTop.py
└── code_cache.json
Docker Compose Configuration
Add the following under volumes in your docker-compose.yml:
- ./scrapers/TwinkTop:/root/.stash/scrapers/TwinkTop
Restart the Stash container:
docker compose up -d
The scraper will appear in Stash’s scraper list.
Usage
Scraping by URL
- Edit a scene in Stash
- Enter a twinktop.com URL in the URL field
- Click “Scrape with… > TwinkTop”
Scraping by Scene Code (from filename)
- Your file is named something like
ttp0004_720p.mp4 - Click “Scrape with… > TwinkTop”
- The scraper extracts
ttp0004, looks it up in the cache, and fetches the scene
Scraping by Search
- Edit a scene in Stash
- Click the search icon next to “Scrape with…”
- Search for keywords from the scene title
- Select the correct scene from results
Technical Details
- Language: Python 3
- Dependencies: requests, beautifulsoup4 (auto-installed via py_common)
- Platform: CarnalPlus CDN (separate from CarnalPlus network)
- Cache file:
code_cache.json- maps scene codes to URLs - Cache refresh: Automatically rebuilds weekly by fetching the sitemap
- Scene code pattern:
ttp+ 3-5 digits (e.g.,ttp0004,ttp0119) - Total scenes: 117 videos cached
- Image loading: Uses
srcattribute directly (not lazy-loaded)
Cache Management
The scraper automatically manages the cache:
- Loads
code_cache.jsonon startup - If a scene code isn’t found, forces a cache rebuild
- Rebuilds the cache weekly (7 days since last modification)
- Fetches the sitemap and scrapes each page’s
og:imagetag to extract scene codes
To manually rebuild the cache:
docker exec stash python /root/.stash/scrapers/TwinkTop/TwinkTop.py rebuildCache
Notes
- Twink Top has no search API, so the cache is essential for code-based lookups
- The cache includes codes from ttp0001 to ttp0119 (with gaps in numbering)
- Performer images are loaded directly via
srcattribute (notdata-srclazy loading) - Performer names in all-caps are automatically converted to title case for consistency
- The scraper filters performer images by checking for “contentthumbs” in the URL to exclude button images
Attachments:
Important note! Rename code_cache.json.txt to code_cache.json after downloading and before running. Discourse does not support uploads of json.
- TwinkTop.yml (619 Bytes)
- TwinkTop.py (12.6 KB)
- code_cache.json.txt (9.8 KB)
Enjoy!