InTheCrack Issues after Site Update

lostLegion15 · August 2, 2025, 6:59pm

InTheCrack has recently updated their website to new layout. This has broken the existing scraper. In a quick effort to fix the issue I tried editing the scraper locally and using an XPath extension to help me update the values just to see if I could get it working again before attempting to create a PR to merge the update into the community plugins scraper. Below is what I’ve come up with, but after reloading the scrapers locally and I click the “scrape” button in a scene the screen just blinks and nothing happens.

When I check the logs, even in Trace level, there’s nothing informative other than it says it’s scraping the performer, etc.

2025-08-02 14:34:54 Debug [0][Name] = InTheCrack
2025-08-02 14:34:54 Debug Processing scene studio:
2025-08-02 14:34:54 Debug Processing performers:
2025-08-02 14:34:54 Debug Processing scene:

I must be missing something somewhere to get it working, or if someone is better at debugging these things some help would be appreciated. Below is the updated XPaths (the Details one needs some work, but I was trying to build workable XPaths first before making additional changes):

name: InTheCrack
sceneByURL:
  - action: scrapeXPath
    url:
      - inthecrack.com/Collection/
    scraper: sceneScraper
xPathScrapers:
  sceneScraper:
    scene:
      Title: //div[contains(@class, "details")]//div[contains(@class, "title")]/text()
      Code:
        selector: //div[@collection-id]/@colleciton-id
      Details:
        selector: //h6[contains(@class, "clipTitle")]/span[position() mod 2 = 1] | //div[contains(@class, "clipDescription")]/text()
        concat: __SEPERATOR__
        postProcess:
          - javascript: |
              details = "";
              parts = value.split("__SEPERATOR__");
              for (i = 0; i < parts.length/2; i++) {
                if (parts[i] == "") {
                  continue;
                }
                details = details + parts[i] + "\n" + parts[i+parts.length/2];
                if ((i+1)*2 < parts.length) {
                  details = details + "\n\n";
                }
              }
              return(details);
      Performers:
        Name:
          selector: //span[contains(@class, "models")]/a/text()
      Studio:
        Name:
          fixed: InTheCrack
      Image:
        selector: (//img[@class="full-width-image"])[1]/@src

Maista · August 3, 2025, 12:34am

The issue you’re running into is that their website is now a full SPA and is rendered using Javascript on the client: Stash only scrapes the HTML and doesn’t run Javascript like a browser would so there’s nothing there to scrape

The good news is that they’ve got an API we can scrape instead which is much easier, so I’ve rewritten the scraper and pushed it to the Community feed

lostLegion15 · August 3, 2025, 1:40am

Good to know! I always thought the XPath scraper’s rendered the page via a sandboxed headless browser (puppeteer or playwright) and then scraped it. Complete assumption on my part (also apparently very wrong). Didn’t know it won’t run JS.

Thank you for updating the scraper so it just queries the API now!

Maista · August 3, 2025, 2:01am

Scrapers (both XPath and JSON) can be configured to use Chrome Devtools Protocol (or CDP) to do exactly that, but we try to avoid it where we can because it can be slow and finicky

I am sorry to just swoop in and solve the issue by the way, I’m going to actually write up my own guide to scraper development one of these days because I’d love it if more people felt comfortable fixing such issues and submitting new scrapers: huge props to everyone who contributes to the CommunityScrapers repo

lostLegion15 · August 3, 2025, 2:19am

No problem. I generally try to solve a problem myself instead of just asking someone else to do it for me, but I appreciate the help either way (also gives me an example to work off of for future changes, etc.).

Also, I can do this if you would like, but there are some small changes to your changes I would like to propose. I can either submit a PR or you can make the edit yourself. Let me know either way.

Changes I made to your changes:

I changed the “Date” because if you go to the Collections page for each entry the date shown there matches the first clip in the array’s releaseDate.

I also added in the shootLocation as this is currently in some StashDB descriptions but not others as the first part. I figured more info the better so I added that in.

Lastly, I updated the image pull to include query parameter w=1400 because I assume that brings in the highest quality version of the image (I don’t think it attempts to pull in a higher res version than that in my testing within the browser).

Thank you again for the quick turnaround. I was just looking for some help and you solved it for me.

Below are your changes with my changes added in as well:

name: InTheCrack
sceneByURL:
  - action: scrapeJson
    url:
      - inthecrack.com/Collection/
    queryURLReplace:
      url:
        - regex: .+?(\d+)
          with: https://api.inthecrack.com/Collection/$1
    queryURL: "{url}"
    scraper: inTheCrack
jsonScrapers:
  inTheCrack:
    scene:
      Title:
        selector: "[id,title]"
        concat: " "
      Code: id
      Date: clips.0.releaseDate
      Details:
        selector: "{shootLocation,clips.#.title,clips.#.description}"
        postProcess:
          - javascript: |
              const {shootLocation, title, description} = JSON.parse(value)
              return ('Shoot Location: ' + shootLocation + '\n\n' + title.map((sub, i) => `${sub}\n${description[i]}`).join('\n\n'));
      Performers:
        Name:
          selector: title
          split: " & "
      Studio:
        Name:
          fixed: InTheCrack
      Image:
        selector: id
        postProcess:
          - replace:
              - regex: (.+)
                with: https://api.inthecrack.com/FileStore/images/posters/collections/$1.jpg?w=1400

Maista · August 3, 2025, 2:42am

I noticed this as well shortly after pushing my original update and have fixed it in a subsequent version

I think that’s a good addition! At some point I think we can also use the shootDate, at least on stash-boxes since they support the new “Production date” field which is not in stashapp yet

I actually chose to use the original image intentionally: you’ll notice it does not appear in the API response nor on any of the pages on their sites as far as I can tell, I had to figure out the path with a little bit of trial and error

The difference between the original image and the resized image is a simple resizing operation that does not add any information but increases the file size so I figured we might as well use the original

Happy to take PRs in the Community repository! Let me know if you would like me to merge the Description change for you (often times people do not want their primary Github account associated with Stash)

lostLegion15 · August 3, 2025, 2:56am

Ah, good catch on the image. Yeah, so ignore that change.

I still need to create a new GitHub account specifically for stash changes, so until that time comes if you could make a change to the Description field for me? I guessed asking you to keep making updates might have a limit so I was going to be ok biting the bullet and creating a new GitHub account so I could make the contribution, but if you’re ok making the description update for now that would be appreciated.

Maista · August 3, 2025, 2:58am

Done in version 79bdb538, you can always roll up an anonymous Github account for future submissions

lostLegion15 · August 3, 2025, 3:04am

You cut the map off in the JS in the latest update:

return ('Shoot Location: ' + shootLocation + '\n\n' + title.map((sub, i)

Maista · August 3, 2025, 3:18am

Just, uh, sprinkling in some very human mistakes to prove I’m not an LLM