Scene groups and recordings

Infinite · December 14, 2024, 12:27pm

A frequent request for stash-box is groups.[1][2] They are an obvious solution for movies, lists, and series. They are also frequently suggested as a solution for tracking rereleases.

It’s no secret that a lot of the thought behind stash-box is cribbed from MusicBrainz. The proposed scene group would be analogous to MBs release groups. However, before we go down that path I think it’s useful to go through how groups fit into MBs entity hierarchy.

The hierarchy is roughty Work → Recording → Track → Release → Release Group.

Works are compositions, which are shared by all recordings of a song. An important concept in music, not so much in porn.
Recordings are specific edits of a sound recording. All releases of “Yellow Submarine” (if we ignore radio edits, etc.) share the same recording, so you only need to document performer/producer/etc. once, and can relate all releases to the one recording.
Tracks are titled recordings on a release. In a sense they’re virtual, since all relationships go on the recording, not the track.
Releases are any packaged grouping of tracks. It has a release date, a title, a release type, etc.
Release groups contains the concept of a release. Any regional dupes, or remasters of a release go in the same release group.

All these concepts don’t map cleanly to stash-box, but the general consensus seems to be that scenes are analogous to releases, which means groups would be used to group rereleases of scenes.

However, this means if we wanted to use groups for movies or lists, we now have groups of groups, which from a database perspective would be quite awkward and require an entirely separate scene-group group type.

My proposal is that we instead adopt the recording concept to link rereleases. A recording represents the physical shoot of a scene, the performers in it, the production date, and any other credited entities like director or production studio. A scene can have its own studio, release date and title, but the recording data would be shared with other rereleases of that recording.

This frees up groups to be used purely as a grouping of scenes. A movie is a group of scenes, which are instances of a recording. If it’s an original movie, all those recordings are unique to the movie, but in the case of a compilation the scenes would be rereleases and could all be linked to the original recordings to avoid having to enter performer data.

Ultimately I think we’ll end up in this place anyway, because the concepts of a movie and a group of rereleases are fundamentally different and trying to jam them together won’t be successful, but it would be good to come to an agreement on this before I start to avoid any confusion and set expectations.

Maista · December 14, 2024, 5:22pm

Great writeup, I agree with you that recordings is a better fit for our use case here. I’d suggest calling them productions or recordings, but we’ll probably have to discuss that further: not every video stems from a single day of filming.

A recording represents the physical shoot of a scene, the performers in it, the production date, and any other credited entities like director or production studio. A scene can have its own studio, release date and title, but the recording data would be shared with other rereleases of that recording.

I’d suggest that more aspects of a recording could be overridden, such as tags and maybe even performers. There are already studios that release multiple versions of the same scene with different cuts to emphasize or remove certain acts like water sport, and this can in turn end up with cutting performers appearances as well.

echo6ix · December 15, 2024, 5:02am

Admittedly, this is starting to get confusing to me.

Discogs seems to have a simpler paradigm than MB, using a hierarchy of a Master Release^[1] and subsequent Versions. It is only applicable to album types, and I do not know how analogous it is to our use-case. The advantage is it is intuitive, including the terminology used.

IMDb maps a lot better to our use-case, since both catalog movies, and scenes (though web scenes are just shoehorned into IMDb’s TV series/episode paradigm). I don’t think IMDb has an equivalent to Groups; the nearest thing is their underwhelming Alternate Versions section for movies, and there’s no inspiration to be drawn from it. However, they do an amazing job in their help docs distinguishing between alternate versions and an original work. Much of it is relevant to our use-case. This is probably more a guidelines thing, but perhaps it would also inform how Groups are implemented. A more informative post was published here.

https://support.discogs.com/hc/en-us/articles/360005055493-Database-Guidelines-16-Master-Release ↩︎

AdultSun · January 7, 2025, 11:47pm

Forgive me for another long, rambling brain dump. Writing everything out like this helps me organize my own thoughts.

Concept Outline

I remember it taking me awhile to wrap my head around the concept of recordings vs. tracks and release groups vs. releases on MusicBrainz, so depending on the implementation I think Echo is right that there’s a risk of over-complicating the average editing workflow. The more a system like that can sit in the background — easily ignored when it isn’t relevant to the current edit — the better, I think. You wouldn’t want it to either overwhelm and scare away new users or editors who don’t immediately grasp the concept, or to add extra layers of unnecessary tedium to otherwise straightforward edits.

But with that said, I think it’s still worth gaming out how specific we could get with our hierarchies here just to get a better view of the big picture. Think of this more as a concept map than a design outline. Then, we can figure out how to translate those concepts into features within Stash-Box. From smallest to largest, I think the full hierarchy would look something like this:

1. Release

The atomic unit, same as a “Scene” in Stash and Stash-Box
Digital vs. DVD, original vs. remaster
Different releases could feature different edits, even to the point of changing its content
- UNIQUE DATA:
  - Release Date
  - Title
  - Scene Aliases
  - Description
  - Scene Cover
  - Distributor
    - Studio behind release, not necessarily the same as studio who originally shot it
  - Studio code
  - Links
  - Duration
    - Edits that share content may have different duration due to title sequences / credits

2. Cut / Edit / Version

Bundles together individual releases that share identical content, splitting up extended digital releases vs. truncated DVD releases, wet vs. dry, censored vs. uncensored, maybe even discrete scenes vs. compilations
Could label these as “Edits”, but that could be confusing since modifying data in stash-box is also an “edit”
- UNIQUE DATA:
  - Performers
    - Possible that different cuts add, remove, or even replace performers
  - Scene Tags
    - Needs to be unique to each cut to reflect changes in content, like wet vs. dry, censored vs. uncensored, abridged scene vs. extended scene

3. Production / Shoot / Release Group

Bundles together all various releases and edits created from the same content
- UNIQUE DATA:
  - Production Date
  - Director
  - Production Studio
    - Studio that originally made the scene, different from distributor, may not be possible to find in all cases

4. Group

Bundles together releases/edits that are somehow related to each other, such as a movie, series, etc.
- UNIQUE DATA:
  - Group Tags
    - Movie, Series, Mini-Series, etc.

5. Collection

Bundles together multiple groups, such as Movie Series #1-10, or even multiple releases of the same movie (DVD vs. BluRay vs. VOD, etc.)
- UNIQUE DATA:
  - Collection Tags
    - Movie Collection, Mini-Series Collection, etc.

Again, this is just a conceptual hierarchy to define the different tiers and outline the relationships between them, not a design recommendation for Stash-Box.

I didn’t go into as much detail with the last two concepts, Groups and Collections. Mostly I wanted to point out that there is precedent for having a group that contains other groups, but there are also several unanswered questions around how those would be handled. Would a Movie collect specific releases of its scenes? Or would it collect the Cuts / Productions / Release Groups? How would they handle Releases that include the entire movie in a single video? Do we have one, somewhat generic object that combines every version of a single Movie, or do we have separate objects for each variation? Depending on these answers, we could end up with even more tiers to the hierarchy, splitting out Movies from Movie Releases, etc.

Object Design

The biggest questions, of course, are how does all of this apply to Stash-Box and how will it link up with Stash? I’ll expand on each point later, but for now my recommendation would boil down to this:

Use the same flexible Group concept from Stash to handle Movies, Series, etc. That would mean two-way hierarchies, Group Tags, and labels for parent-sub relationships. I don’t believe inheritance would be useful here.
Create a stand-alone Production object, separate from Groups. These would be lightweight and only have fields for production date, director, and studio. Scenes attached to a Production would inherit its prod. date and director, but not the studio. We could start with one-way inheritance and expand on it later if necessary.
Cuts would be handled as a separate object, if at all. They would also be lightweight, containing only tags and performers for attached scenes to inherit. The ability for scenes to share the same tags and performers automatically is really the only advantage this concept gives us. Less demand and higher stakes, so definitely lower priority compared to Groups and Productions.
Inheritance should sit in the background as much as possible. Unlike the MusicBrainz model, scenes should be able to exist without an attached Production or Cut. Requiring editors to create or attach one to every new scene — or having Stash-Box create them automatically when missing — would create more confusion than necessary, especially since not every scene would benefit from an attached Cut or Production.

#1. Groups

I haven’t been able to spend much time with it yet, but Stash’s recent move from a limited “Movies” category to a more flexible “Groups” concept is the most obvious approach for Stash-Box to use for bundling scenes together. You can attach scenes, parent Groups, sub-Groups, or all three simultaneously. That flexibility plus the inclusion of Group Tags allows for a wide variety of uses for the same category of objects, while still clearly labeling and defining each particular usage. It shifts many decisions from questions of database design to questions of content moderation, while making sure the two platforms are still as closely aligned as possible.

The part where the flexible group concept breaks down for me is inheritance. The whole situation would get incredibly complicated from a moderation standpoint if there was no data inheritance baked-in between levels of the hierarchy. But on the other hand, I’m sure it would also get incredibly complicated to implement an inheritance system on top of these generic, flexible Groups. So assuming the juice isn’t worth the squeeze (which is basically what Infinite said up top), then we need a different solution for the tiers of the hierarchy that need data inheritance.

Using my outline from earlier, the flexible Groups from Stash are still the best fit for Movies, Series, Mini-Series, Movie Collections, Mini-Series Collections, and basically any other kind of custom Playlist. From a moderation perspective, I expect StashDB will be fine with using one generic “Movie” object to bundle together the various releases of that movie, so I wouldn’t worry about needing a Production / Recording / Release Group concept to bundle Groups together as well.

Since “Release” is just another name for “Scene”, the only tiers left to address are “Productions” (serving the same function as a Release Group) and “Cuts” (representing different sets of content pulled from a single Production).

#2. Productions

Productions wouldn’t need to carry much metadata of their own. The only pieces of data that would be identical for all scenes from the same Production would be Production Date and Director. And since that relationship is guaranteed by definition, scenes could inherit both of them from their production.

The only other relevant field would be Studio. Unlike Production Date and Director, this field shouldn’t be inherited because not all scenes sharing the same Production would share the same studio. I referred to separate fields named Production Studio and Distribution Studio in the outline, but attaching different studios to a production vs. a scene is functionally the same thing.

Every other piece of data (title, duration, description, tags, performers, aliases) would depend on the particular release. We could add fields for some of these too, but they would likely be borrowed from the original release and wouldn’t be strictly necessary for the concept to work.

#3. Cuts

Cuts would carry a different set of data. Even though all scenes under the same Cut would share the same Production, not all scenes under a Production will share the same Cut.

Cuts are defined and differentiated by content, meaning tags and performers would be identical for every scene under the same Cut. The primary advantage would be the ability to keep those tags and performers in sync automatically. Without that inheritance in place, the feature wouldn’t be worth it.

For me, the two concepts would need to exist as separate object categories. My concept outline further up puts Cut on a tier below Production, but I don’t think that strict hierarchy needs to be reflected in the design. Instead, each scene could be added to a Production, a Cut, both, or neither. Inheritance of prod. dates and directors could be hard-coded into the Production concept, and inheritance of tags and performers could be hard-coded into Cuts.

Trying to combine both concepts into a single object — let’s call it a Release Group — sounds like a bigger headache to me. Sure, it would be simpler for situations where every scene released from the same Production shares the same content. A single Release Group could keep the prod. date, director, tags, and performers in sync. But if there is a difference in content, what do you do? You could still add every scene to the Release Group, but you’d have to be able to ensure that only the prod. date and director are inherited. And if you want the tags and performers to stay in sync too, you’d still need to create additional Release Groups for each Cut. At that point you’re doing the same work as if you had separate objects for Cuts and Productions, except now there are more inheritable fields and moving parts, making it easier to mess something up.

#4. Inheritance

So we have our flexible groups, we have our inflexible Productions and Cuts, and now we have to untangle the biggest knot in this design. Inheritance.

Again, ideally these meta-objects should sit in the background as much as possible. Scenes should be able to exist without requiring an attached Cut or Production, simplifying the creation process. It should be as intuitive as possible for editors to open the scene, make their edits, and move on without needing to understand how these concepts relate to each other. The system should be set up in a way that these considerations are handled for them, doing the work they may not be aware needs to be done, and preventing mistakes they haven’t learned to avoid.

If we don’t require scenes to have an underlying Production (a la MusicBrainz), I believe that also means we’re talking about a different type of inheritance. The MB model would have us automatically creating an underlying Production object for every existing scene. At the same time, any existing production dates would migrate out of the scene and into the Production. All production dates would now be attached to the production and not the scene. This could be described more as a relational inheritance process. The production date is only found in one place, the Production, and any time you need a scene’s production date you’re really asking for the date of the attached Production.

So, any time a scrape or a filter calls for the production date of a scene, the call would be redirected to the attached Production instead. Anytime someone wants to add or correct a scene’s production date, they would have to edit the date of the attached Production. And for any re-releases, we would have to link them together by essentially merging the two Productions together, which might not be the most intuitive process.

Instead, I’m thinking about an inheritance system that actively submits edits to linked objects. If the data in Object B is inherited from Object A, then any time the data in Object A changes, an edit is submitted to make the same change to Object B. To say it a different way, we’re triggering two separate write operations, one for Object A and one for Object B.

That way, we would be able to continue saving production dates directly to the scene. Scenes could continue to exist without an attached Production. And we would only need to create a Production when we want to link the production dates of two scenes together. This process sounds more intuitive to me, at least from an editor’s perspective. You wouldn’t have to constantly deal with this new concept bolted onto the side of every scene. Instead, you’d be linking scenes together by adding them to a shared group, just like adding scenes to a Movie.

In the Stash-Box interface, this could look like a Merge edit. Somebody submits an edit to Production A. That edit appears in the queue, showing links to Release X, Release Y, and Release Z, all of which will inherit that data. Notifications would trigger for anyone following one of the affected objects. The edit would also appear in the edit history of every affected object. Once the edit passes, Stash-Box applies the new set of data to Production A as well as Releases X, Y, and Z.

Editing Workflow

Now that we have an understanding of how inheritance could work generally, how could this look in practice for our two new objects, Productions and Cuts?

Productions

Say you create a new scene. It appears to be an original release, so there’s no need to create or attach any Productions or Cuts to it. For now, it stands alone. This situation should be identical to the current state of Stash-Box. The production date, director, tag, and performer fields can be freely edited. Nothing is inherited from anywhere else.

A few months later, say someone adds a redistribution to the stash-box. The video has the same content as before, just a different production logo at the beginning. An editor notices this, creates a Production, and links both Releases to it. Now we have a few questions to answer.

What does the creation process for a Production look like?

I imagine it looks something like the Merge edit forms in Stash-Box. At the top, you type into a box to find and select the scenes that should be linked together. Below that, you have a couple fields for production date, studio, and director. I think those three fields should automatically fill with suggestions based on the scenes added to the new Production. In the event of a conflict, it can grab the studio and director from the oldest release date, then ignore the order of release to grab the oldest available production date. The editor can use those suggestions as-is or write over them if necessary.

I also think the only two hard-coded requirements for creating a Production are that it contains at least two scenes, and that it contains a production date. Directors are only credited by a handful of studios, and the original production studio itself isn’t always known. For example, most Euro studios seem to license content from a few unnamed production houses, creating a ton of redistributions without a clear “original” release. To reflect that ambiguity, those Productions should probably leave the studio blank.

How do Releases inherit data from the Production?

My first thought was that as soon as a scene is attached to a Production, the production date and director should be locked. Those fields don’t belong to the scene anymore, they belong to the Production. This makes the inheritance system cleaner, in my mind. Data only flows in one direction, downstream. Editors who are unaware of a scene’s connection to a Production, who might not even understand what a Production is, would be prevented from changing those values. Only editors who understand what a Production is and how it functions would know to edit the Production instead whenever those fields need updating.

But, that method wouldn’t be the most intuitive either. You could have users who recognize that a scene’s production date is inaccurate, try to correct it, and find that the stash-box won’t let them. They’ve been able to either learn or intuit how production dates work, but in order to apply that knowledge they must now figure out how Productions work as well. Does that overcomplicate the process?

The other option would be to leave those inherited fields unlocked. But in order to keep that data in sync with the Production and other Releases, that means the inheritance would have to flow both ways. Editing the production date of one scene would update that field in the connected Production, then the updated Production would pass it on to the other Releases.

Now I wouldn’t know from experience, but two-way inheritance sounds a lot harder to develop to me. It also raises the question, do we want it to be easier to edit one scene and affect multiple? That concern could be mitigated with a few simple safety measures though. An edit that modifies an inherited field could be considered destructive, lengthening the minimum amount of time spent in the queue. The same locking mechanism from before could also be made manual. For example, a mod could lock the production date within the Production and the production dates of all the Releases underneath it could be locked as a result.

Cuts

The same considerations from before would apply here as well. Locking vs. unlocking, one-way vs. two-way, etc. The only difference here is that the inheritable fields would be tags and performers. In comparison, production dates and directors are much more niche than tags and performers.

So even though the unanswered questions and possible approaches are the same, the stakes feel higher. More users will be frustrated or confused if they can’t edit a scene’s tags or performers directly anymore. Conversely, it would be a bigger deal if an unwanted change to a scene’s tags or performers automatically changed those fields for several other scenes as well.

All of that plus lower demand (I believe this thread is the first time the idea’s come up) is why Cuts would be lower priority than Productions and Groups. This design leaves room for us to add them after Productions if we want — benefitting from any lessons learned from implementing the other feature first — but we could just as easily skip this concept entirely.

Well, that’s the write-up. If you’ve made it this far, thanks for sticking with me. I think this all makes sense, but without a background in software design I realize the whole idea could be built on top of false assumptions or unrealistic expectations. On the other hand I’ve been going back and forth on this for 2 or 3 days now — re-writing and re-arranging to try to make it easier to follow — so at least I can say it’s thorough if nothing else.

echo6ix · January 16, 2025, 10:21am

That was a lot to take in and digest so perhaps I am not understanding, but:

I am not sure that is accurate, but I suspect you may know that as you used scare quotes.

Since “Groups” is in the conceptual stages, we have the opportunity to recalibrate terms so the software can conform to the metadata/content, rather than having the metadata conform to limitations of the software.

Discussion with word scene can get convoluted because it has multiple meanings:

The web scene meaning, basically how stash-box accommodates content. It is a set of at least one or more distinct elements within. Typically a download of a single file.
A distinct structural unit (element) within the plot that is part of the whole plot (set). It can be defined as (a) occurring in a specific time or place, (b) truncated by a clear shift in action away from a, and/or (c) often characterized by a visual transition or hard cut (though not always). Examples include content from PervMom.com, PervNana.com, Taboo Heat, Bare Back Studios, Taboo Fantasy, etc. These type of web scenes (sets) often contain 2-4 distinct scenes (elements) each punctuated with a completion of the encounter between the performers, and a hard transition to a different place and time, wardrobe, and sometimes performer configuration. In fact there is even a StashDB tag for this content, though highly underutilized. The other example is a plain old compilation of abridged scenes.

The distinction using set and element is analogous to the distinction between movie and the scenes that makeup the movie.

Element. The basic distinct unit within a movie or web scene, that is part of the whole (the set).
Set. The set is like a movie or scene, and it can contain at least one or more elements.
- Most if not all movies contain multiple elements
- Most scenes contain a singular element
- Some scenes contain multiple elements
Subset. Contains some elements from another set. I’m not sure where this fits into the paradigm, but the term is intuitive nonetheless.
Class. A collection of sets, like a series, whether a movie series or episodic web scenes.

Anyway, my last paragraph here got carried away because I thought it was interesting that terms from basic set theory we all learn in elementary school math is applicable here to clarify these concepts unambiguously.

echo6ix · January 22, 2025, 9:47pm

Revisiting this topic and what I wrote. That has to be one of the most pedantic things I’ve ever read