JAV english scraper, how do I get it to work?

godisgonenow · April 8, 2026, 3:10pm

I just spent 16 hours reading crawling throug various pages , dicussions trying everything. The only scraper that work is the StashDB and JAVstash(which to my understanding is more like a metadata warehouse) of which is not ideal becasue StashDB have so few JAV title and mix between ENG and JAP. The other is only Japanese language. I tried every built-in community scraper and none of them work either they return error like “dial tcp: lookup www.javlibrary.com: no such host”. then I tried install the skScraper which I don’t know how exactly it work, There is no new scraper option.
I’am at my wit’s end here.

Ther very least I want is just a cover image and title name in english.

Sakoto · April 8, 2026, 4:08pm

I think the architecture as it stands for scraping metadata through existing implemented methods on stash has ran its course and probably needs to be revisited. You are at the mercy of the web host or cloud providers security which is a big no no.

People can suggest stashdb, but I have no interest in using that feature. For users like us, there needs to be a more modular solution. I have got around this in the past by using local scripts and passing the browser session key. Perhaps i(or somebody anyway) could make the ground work for a modular plugin to pass this key in to scrape sites. This should bypass cloudflare.

To be honest though I havent really sat down and thought on it for a long time. This is merely an idea ive thought about in the last hour, but the problem is certainly real and its definitely time to rethink a solution.

DogmaDragon · April 8, 2026, 4:55pm

JavLibrary_python works, but it requires setting up Flaresolverr instance.

JavDB scraper works: https://scrape.feederbox.cc/scene?id=mYpUybMB
JAVDatabase scraper works: https://scrape.feederbox.cc/scene?id=6Rt8QMWh (requires valid User-Agent to be set in Stash)
Not sure which others you tried.

You can also scrape directly from R18.dev who offer dumps of their metadata at https://r18.dev/dumps. You can then use local Stash scraper from JAV Stash admin to scrape against it.

Sakoto · April 8, 2026, 4:59pm

Im sure he means in the UI. Which doesnt work.

DogmaDragon · April 8, 2026, 5:05pm

More so that no user-input required scrapers are becoming more rare. Another thing we can thank LLMs for. Before anti-scraper measures were an afterthought as hobbiest scrapers weren’t a threat. Not that multiple bots powered by LLMs mass scrape the whole internet on a daily basis, it can’t be an afterthought as no protection can lead to them effectively DDoSing you and in some cases leading to expensive bills.

Sakoto · April 8, 2026, 5:10pm

Yeah im well aware of the cause but we need a solution. Ai and scrapers aren’t going anywhere anytime soon. In some light reading we might be able to use playwright as a plugin. But its all guess work for me for now.

DogmaDragon · April 8, 2026, 5:10pm

Looks like it needs a valid User-Agent. Updated my comment.

feederbox826 · April 8, 2026, 5:47pm

I’ve documented it before but its basically on a range from

No protections
Basic Referer/User-Agent bypass
IP rep block
TLS impersonation
simple CF challenge (cloudscraper)
advanced CF challenge (flaresolverr)
new advanced CF challenge (byparr)

solutions:

TLS impersonation I plan to get around with add support for scraping with surf (tls impersonation) by feederbox826 · Pull Request #6806 · stashapp/stash · GitHub
cloudscraper is dead so TLS impersonation is hopefully the bypass for that
flaresolverr seems to be unmaintained but byparr is stepping in it’s place

CDP lies above TLS impersonation but below byparr/flaresolverr. Automation like CDP, Playright and Selenium are all detectable, see GitHub - ultrafunkamsterdam/undetected-chromedriver: Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM) · GitHub for previous attempts on undetected selenium webdriver.

The ultimate solution is

residential SOCKS5 UDP proxy ($3/GB)
up-to-date TLS impersonation
fallback on byparr/flaresolverr for challenges

Playright/ Selenium will not work for CF “under attack” tunstile checks

Sakoto · April 9, 2026, 1:30am

feederbox826:

I’ve documented it before but its basically on a range from

No protections

Basic Referer/User-Agent bypass

IP rep block

TLS impersonation

simple CF challenge (cloudscraper)

advanced CF challenge (flaresolverr)

new advanced CF challenge (byparr)

solutions:

TLS impersonation I plan to get around with add support for scraping with surf (tls impersonation) by feederbox826 · Pull Request #6806 · stashapp/stash · GitHub

cloudscraper is dead so TLS impersonation is hopefully the bypass for that

flaresolverr seems to be unmaintained but byparr is stepping in it’s place

CDP lies above TLS impersonation but below byparr/flaresolverr. Automation like CDP, Playright and Selenium are all detectable, see GitHub - ultrafunkamsterdam/undetected-chromedriver: Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM) · GitHub for previous attempts on undetected selenium webdriver.

The ultimate solution is

residential SOCKS5 UDP proxy ($3/GB)

up-to-date TLS impersonation

fallback on byparr/flaresolverr for challenges

Playright/ Selenium will not work for CF “under attack” tunstile checks

Nice to see its on your radar but Im assuming this is mostly with automation in mind? While I know thats the goal for anything these days, I dont think people wouldnt mind a manual option as long as the clicks are minimal. Like I mentioned before passing a session key or browser session to impersonate could be a decent workaround.

feederbox826 · April 9, 2026, 3:43pm

browser sessions/ session keys are ephemeral (on the scale of 30mins-1h) so yes, complete automation is preferred.

TLS impersonation should be quite light and fast (negligible memory/ binary increase) compared to the 300MB of disk for chromedriver/byparr and 500MB-1G of memory usage

Sakoto · April 9, 2026, 5:41pm

Ill back off for now then. You seem like you have much more complex understanding than I do. I appreciate you looking into it.