CommunityScrapers master branch breaking change

As dicussed previously in Discord 1 2

We now have a breaking candidate and decision to make with v0.29. Because of stash#5793 (Group scrapers are entirely broken), “Group” type scrapers do not work with any release version below the upcoming v0.29. CommunityScrapers#2229 was made to transition it under the impression that v0.27 deprecated movieByURL and groupByURL was functional. CommunityScrapers#2368 will be a breaking change as any group scrapers will not work below v0.29

What is the best course of action here?

  • maintain two seperate branches to now actually match develop and stable
  • continue with the breaking migration
    • remove pseudo stable branch in favor of develop to avoid building twice and halve the build times
  • maintain a laststable branch and exclude breaking commits
  • abandon/ delay the migration for the next release
  • backport the scraper fix into a 0.28 bugfix to maintain compatibility

I think removing stable branch is the worst option as https://stashapp.github.io/CommunityScripts/stable/index.yml is pre-configured source index for everyone. It would require config file migration to change.

@Maista, @WithoutPants your thoughts?

well we can remove the double build and just copy the files over, that shouldn’t be a big deal. The issue is the references to it both in the builder and as a branch that has gone unused

This is my preference and original intention. stable was always intended as compatible with the last release, with develop being aligned with the develop branch on the stash repo.

What’s not clear is what process to follow for maintaining the two branches.

In general, non-breaking changes which are compatible with the current stable release should be committed to stable, which should be regularly merged into the develop master branch. Changes that are compatible only with the develop master branch should be committed to the develop master branch of the CommunityScrapers/CommunityScripts repos, with the develop master branches merged into stable after release.

If the current state of the stable branch is broken for 0.28, then we either need to reverse the migration on the stable branch and leave it on the develop master branch instead, or I’ll need to do another bugfix release, which is probably not a terrible idea in any case.

As far as I can remember, there were only two breaking changes

  • MovieByURL → GroupByURL (upcoming)
  • The addition of Image/Gallery scraping
    • this would trigger a validation error, similar to groupByURL index out of range

As for other updates,

  • Selector/ compatibility updates should be pushed to both
  • Schema/ stashapp-tools updates are usually limited to CommunityScrapers as we use py_common

I’m not sure there is enough difference to warrant keeping (and maintaining) both branches since this would still break instances behind the bugfix version. In practice, we would end up with a desync between users on develop and stable, where issues minor reported won’t be available until the next stable release, or until a dev cherrypicks fixes over just to avoid a specific commit

Instead, I would like to propose switching the error of groupByURL index out of range into a WARN similar to Unknown scrape type that would allow for future forwards-compatibility with any and all new scraper types, while backwards compatibility will follow the same path. This, combined with the bugfix should cover almost all cases for forwards compatibility and avoid the hard fork at v0.29 and below (but still inevitably break any versions before bugfix)

2 Likes

I think two branches are necessary. There’s always going to be the potential for breaking changes being introduced. I can introduce code to handle unknown fields, but there’s always going to be an issue of scrapers not behaving correctly when running an older version vs the current dev build. Having a stable branch at least gives users a second place to try for a scraper if the dev version doesn’t run with the stable release. If we don’t want to actively maintain the stable branch, then it should probably just be merged from the develop branch on each release.

It’s gotten a bit late in the development cycle to do a bugfix release. I think the two feasible options are:

  • delay the scraper migration PRs until 0.29 is released, or
  • assuming everything in the develop branch is compatible with 0.28, merge develop into stable, then merge the outstanding breaking PRs

I’ve spent some time trying to figure out what we can do on the stash end to mitigate against this. One option as stated is to turn off the strict yaml unmarshalling so that unknown fields won’t prevent a scraper from being loaded. My main concern is that it may end up causing scrapers to work incorrectly in ways that may not be clear to the user and anyone trying to debug. It’s not hard to warn the user if there are extra top-level fields in the config, but it’s a bit more non-trivial for fields deeper in the config tree. Making the config parsing process more fault tolerant in general is probably ideal though.

We could add a minimum config schema version field to the scraper config, indicating (warning or erroring) when the schema version is too high for the users running system. We’d bump the number when new fields are introduced. I think this doesn’t really mitigate against the issues inherent in our develop/stable scraper repo, but at least it gives a bit more information and warns the user of potentially broken functionality.

The plan was always to delay until the release of 0.29, releasing it just for the people on develop doesn’t seem to be worth the tradeoff of deprecating fields

I still believe that maintaining two branches is quite troublesome (for reference, the PR changes 125 files).
Practically, it would be better to maintain a backported branch that just has automated movieByURL → groupByURL sed replacements via Github Actions

I think schema version would work but would be better to break the update checker so users get an error when they attempt to download it instead of at runtime

There was actually another breaking change that is currently PR’d #2493

replacing url with urls which will cause breakage in <v0.28.0 despite being implemented in 0.27 due to #5294

Looking back I’m not sure how cases like this should be handled, maybe if we write a validator? We also can’t bugfix previous releases to not accept the new scraper formats. Adding explicit compatibility versions doesn’t cover these cases since they were bugs lurking in the code that were addressed long after the release/ bugfix window

As we close upon v0.29, what’s the plan for this? Since stable is the default, it wil require manual migration to master on v0.29 to use groupByURL.

After a grace period (if any) we would need to hard fork and require users on 0.28 or 0.29 to manually change their scrapers branch reference (as well as splitting the CI)

So I just had a look over the deploy script, and I only now see that we’re not actually using the stable branch at all currently. It’s just checking out the master branch for both data sources.

The inherent expectation then is that (at least currently) both stable and develop data sources require the most recent stash release. The way forward would then be to introduce the breaking changes post release and continue with the status quo. This will obviously break the scrapers for users running pre-0.29.

If we want to maintain compatibility for users pre-0.29, then introducing a 0.28 or laststable branch/data source would be an option, with the expectation that this would not be actively maintained and would be a snapshot. Users that run into compatibility issues that do not want to update could be directed to use the alternative data source. The general preference though would be to steer users to use the latest stable version of stash.

1 Like