Automated Mass Organization

Valvatore · May 1, 2025, 1:10am

I have a pretty big collection at this point (~60k scenes), I’m trying to set up a way to automate the mass organization of scenes. I have about 11k that have been scanned in but not organized. I know there are some built in tools for this, but they aren’t quite what I want. I have done some mass automation in the past using selenium scripts that can go in and scrape using essentially the UI, but it’s a little weird using selenium on something that runs more or less locally, and it’s pretty fragile.

I’m comfortable with coding, I guess my main question is if there is a way to do what the ‘scrape’ button does programmatically, comparing phashes against stashdb or tpdb using my APIkey and everything. This way I can just get a response and enter the metadata into my SQL database directly and have done with it. I’m a little concerned about scene covers using this method though. Anyway, any guidance would be appreciated.

DogmaDragon · May 1, 2025, 2:55am

Everything Stash does on the frontend is done programmatically using GraphQL. In the browser, you can use Developer Tools > Network tab to inspect what it does.

Check out the API documentation/playground for more details.

PPP · May 1, 2025, 3:49am

As @DogmaDragon mentioned, there’s the GraphQL API that gives you a lot more freedom to do what you want.

I’m more partial to SQL though. There’s nothing stopping you from using a sqlite client to connect to the database and bulk update records. It’s a lot faster too. Having said that, if you’re not familiar or comfortable with using SQL I’d steer clear – there’s absolutely no guardrails and the potential to corrupt or lose all your (meta)data is high.

DogmaDragon · May 1, 2025, 4:14am

To add to that Stash API supports execSQL mutation which can be used to run straight SQL too.

olddude · May 26, 2025, 7:51am

Dogma kindly re-opened this topic. There are ways to bulk tag and organize at scale. I’m talking about methods that scale above 100,000 scenes. The methods involve coding and data management. However, mostly it requires discipline.

I’m back and I can provide more guidance on these methods. However, you may not like the answers. The methods will not help you organize your existing stash. The methods work for downloading and tagging content.

You may be better off deleting everything you have and starting over using new methods. This is where things get tough. the coding and technical stuff is actually easier. The hard part is to overcome the sunk cost fallacy.

olddude · May 26, 2025, 7:55am

I suggest you read this post from Ronnie711.

olddude · May 26, 2025, 8:00am

Note the part about nuking his 32,000 scene stash and starting again.
If you read on you will note his point on tagging performers.

I have bad news. There is no good automated way to reliably tag performers at scale. Simple methods that work on matching names that consist of ASCII string have limitations. All scrapers have this limitation.

In order to tag performers I had to start my own project to catalog and manage performers from all my studios (and others). This involved creating my own performer database. The initial cost was around $1000 on AWS and I need to spend another $1000 to handle the indexing.

I decided to manage performers using facial recognition. It’s not perfect but it can handle issues which are hard using ASCII.

PPP · June 1, 2025, 4:17pm

I’m surprised by the amount of your AWS costs. Would you mind sharing the breakdown?