StashDB: The Overly Picky Metadata Police

StashDB, the community-driven database for porn metadata, has a peculiar obsession with minutiae that makes it a nightmare for contributors. While image boorus allow you to find anything you’ve seen before thanks to the tags, StashDB is always incomplete. Many scenes aren’t even tagged, and nobody seems to care, even though tags are most likely what you are going to use to filter and find the scenes in StashApp. I only want to add performers, studios, and tags to scenes, but StashDB insists on studio links, descriptions, and other trivial details. This doesn’t help anyone find scenes—who cares about all of it?

I haven’t submitted many, many scenes because I didn’t find the studio link. The fact that I have to search for the studio link in the first place, when I don’t even care about having that in my scenes’ metadata, is really annoying. Since StashDB is so incomplete, I have to do the search for metadata on other websites. Why should StashDB users have to browse IAFD, Data18, and such adult metadata providers, when StashDB is supposed to be a metadata provider itself?

The rejection rate for submissions is astronomical due to the gatekeepers nitpicking over the smallest details. This discourages many users from contributing, as getting your submissions rejected multiple times can be really bothersome. Imagine if you could submit a scene with minimal metadata, and then the community could polish it later. StashDB would be much more complete and useful by now as a result. Instead, the gatekeeping culture drives away potential contributors, leaving the database incomplete and less valuable for everyone.

2 Likes

Disclaimer: I’m part of StashDB Admins group, but I don’t actively deal with day-to-day operations.
Below are just my thoughts not represenative of the StashDB team.

Tags are optional. Most initial submissions only include the tags provided by the studio. But tags are the primary area where collaborative effort of people that actually watched the content can have the best results.

Most of the time all the required details are directly available from the studio link and the studio link is required to validate the accuracy of that metadata. Community also has stash scrapers for most of these studios, so you don’t even need to manually add them.

StashDB is a metadata aggregator not provider. Studios are the providers of both content and the original metadata.

I wouldn’t call that astronomical.

First, others need to know there is an issue to fix. Second, correcting something takes more time than submitting something new. Third, requiring original metadata provided directly by the studio that released the content is a bare minimum.

11 Likes

@DogmaDragon makes some really good points in his retort. However I “feel” the exact same way as you. I understand the need to create a gated entry to adding to the database otherwise the metadata would be worthless.

In an effort to get over this hump (specifically for just grabbing metadata) I actively add a stashID as well as a theporndbID to every scene. I have found that many times if the stashID has trouble finding metadata for a scene then theporndbID almost certainly will. And a median percentage also come with markers which is helpful.

It takes a little extra time to add both ids but it seems very clear that their databases are vastly different. If it is older or more obscure (like some early bangbus or brazzers content) then theporndb seems to have them all. The same holds true on the flip side with a stashID. I am definitely not trying to say one is better than the other, but at the end of the day if I drop 200 videos and 200 galleries into stashapp having both IDs available is the difference between 10 minutes of work and 3-4 hours. Even if theporndb tags are not at the same standard as a stashID it is a great place to start. Especially if you have no idea who the performers or studio are.

Got a little off your direct topic but it is all connected with the submitting and retrieval of the metadata that makes this app so amazing.

… well the porn helps too.

You’re very presumptuous about how and why people use stashapp. Just because you personally use it that way doesn’t mean everyone does.

If you think it’s not worth your time or beneath you to do it, why would you expect someone else to do it?

I disagree with this. By lowering the standards, the general quality of the scene would go down. I would much rather have a high bar to begin with, as this creates the highest likelihood the scene will have fleshed out metadata to begin with. A vast majority of users are going to just match by phash and then move on. I’d be surprised if all by the tiniest fraction of the scenes ever gets touched again after initial creation. I know I that am rarely compelled to go and push changes to scenes if they’re matched, especially if it’s a minor addition. Studio links also allows for data to be validated by other contributors.

Our experiences are obviously different. A vast, vast majority of the scenes that I have are found in the database. I’m not sure if you have very niche interests, but the database seems quite complete.

Your philosophy seems to be that the community should pick up the slack you leave as it’s beneath you. If you don’t want to contribute, don’t.

Any data person will tell you that an incomplete database is much preferred to one with bad information. The former you can easily remediate, the latter not so much.

3 Likes

For a couple years, I used and improved a couple community scrapers (hanime.tv & hstream.moe) to get metadata and submit new hentai scenes.

The scrapers provide everything but a studio link, which wasn’t a problem for the first couple years.

A few months ago, all my submissions started being being rejected because someone decided to start gatekeeping these niche scenes for not having a studio link.

Most of these scenes come to dvd, which means there’s not much of a studio page to begin with, and the ones that do have a page usually don’t have English metadata or tags, just a cover and Japanese summary. Tracking down each Studio link did not feel worth the effort, not when I was scraping otherwise perfect metadata.

I’m pretty sure I was carrying the torch on hentai scenes, but since I’ve stopped contribuying I don’t think there’s been much activity.

It’s unfortunate since I have 1500 scenes with metadata that I never got to submit.

1 Like

I don’t think this is gatekeeping. I checked your rejected edits and it looks like @Lanthe is just enforcing the guidelines - assuming he’s correct. From his comment,

commenting here but this apllies to most of your edits

  • while editing scene please also remove the old tube link, not allowed by guidelines
  • anidb its not a studio link, but it has the correct scene studio link under Resources just copy/paste it
  • scene cover by guidelines should be the one provided by studio, in this case the original dvd cover from studio page, not the tube site thumbnail even if higher res

@Lanthe also provided the correct URL for your submissions in some cases, but your edits stalled without revision or appeal on your part.

I suspect it feels like gatekeeping because your earlier submissions of the same kind slipped through before anyone familiar with the content noticed. Based on his comments, it doesn’t seem like much work to have those submissions comply with the rules. Perhaps you can discuss it with him on the aforementioned StashDB thread or start a new discussion on the appropriate channels here.

This is basically why I have not applied to be a contributor.
I appreciate the data provided by StashDB and it is good to have an accurate source of metadata.
But I don’t have the kind of time required to complete all the metadata on all of my unmatched scenes.
Only ~55% of my scenes have StashIDs and 97% of performers.
If I don’t get a match from StashDB, TBDB or FansDB, I just give it a title, some tags, add performers I recognize and call it a day.

Regarding the nitpicking and metadata accuracy, I had assumed that there is a legal requirement for that data. Studios are required to keep accurate metadata on performers and dates of production for obvious legal reasons and I assume it’s StashDBs goal to have the same level of accuracy.