Path restrictions for scrapers

Hello there,

as I have many scenes from different sources which require different metadata scrapers, it would be great to have some option to limit the usage of scrapers to specific paths.

Currently when tagging new scenes I’d go and select all scenes from a specific site (either manually or by filters on the scenes page) to use the identify task with only the fitting scraper(s) enabled, as throwing in all scrapers could lead to wrong identifications if there are matching IDs on different sites and cause unnecessary requests to wrong sites.

I think a suitable way to manage that could be to add optional “path must contain” and “path must not contain” fields (behaving either as a “this paths only” or “all except these paths”) to the scraper options when adding them to the identification-task


(the field should allow multiple paths to be added, so maybe adding a new textbox if the previous is used?)

As users tend to have different paths and ways to manage their scenes this would be preferrable over using a field in the scraper definition which could get overwritten on scraper updates.

That would be a huge quality of life feature in my opinion.

It’s already possible to run Identify task on a select path.

  1. Go to Settings > Tasks > Library > Identify….
  2. Look for “Identifying all Scenes” at the top of the page and click to select specific folders.
  3. Select a path and click . Repeat with as many paths you want to include.
  4. Click Confirm.

I see, I’ve been using the Scenes Page and identifying from there.
But the option you listed would still require selecting the specific scrapers and directories every time, from what I’m seeing, right? So no way to permanently say that scraper x should only be used when the file is in path y?

My target was to assign scrapers to specific paths to simply run them on all unidentified scenes, allowing them to use the proper scraper based on the path without having to manually redo it for every path every time.

Correct. It defaults to all scenes each time.

It could be done with some scripiting and API, but there isn’t a way to permanently assign specific scrapers based on path in the graphical interface.

You could also work around this by using Tagger view and creating one or more filters.

For example if you have a given path where you download all of your Pornhub clips, you can go the scenes page, create a filter for that path (could be a regex if you have more complex requirements) and then set the Source to the Pornhub scraper: now you can simply pick this filter whenever you’ve downloaded something new from Pornhub and you’re ready to mass scrape everything at once. Note that with the simple filter I’ve just described you would probably end up re-processing scenes that you’ve already scraped, so you might want to add a condition to the filter that would exclude those: something like “URL is null”, “Title is null” or even “Organized is false”, depending on your workflow :slight_smile:


1 Like