Cleaning up converted files

thekiteflyingelephant1 · September 30, 2025, 9:42am

I used Tdarr to convert and compress pretty much everything, but now even though the original videos are gone, I have duplicate entries. Most converted videos were auto identified, so I can probably run cleanup to clear out the old entries, but the problem is the metadata I’ve added manually.

Let’s say I have an MP4 for which I’ve manually written tags, title and performer for. I can see the new duplicate MKV, but can I automatically merge the MP4 metadata to the MKV or do I have to do it one by one?

Any advice is appreciated, thanks!

olddude · September 30, 2025, 1:50pm

TLDR; Encode in place, Rescan.
I have dozens of posts on this topic in the discord.
Make sure you do not change the path/filename.ext and just rescan.
Do not encode to mkv … just rencode in place to the same location.
IIRC, this is the tdarr default.

I have re-encoded over 100,000 files and saved over 100 TB.
I never had to redo any metadata. All metadata was preserved.
You are doing too much work.

Topia · September 30, 2025, 3:34pm

Sadly, I don’t see another way around. You see, HandBrake isn’t the only thing in the process chain that does things for me.
In short: there’s a folder where I put all videos and images with a specific naming scheme. A script checks if the file name is unique and renames it to make it unique if it has to. Then it sorts the media files to the ImageMagick watch folder (if it is an image file) or the HandBrake watch folder (if it is a video file). There the media gets converted. The original file gets deleted; the converted file goes to my Stash library.
Now, you might say that while Tdarr, afaik, only converts video files, I could use FileFlows instead because it can do both. Well, yes, but it doesn’t support the AVIF image format, even with the ImageMagick plugin. So, I don’t see a better solution than mine right now, especially with the file naming I use in combination with “uniquifying” all file names, so I don’t accidentally overwrite stuff in my Stash library.

feederbox826 · September 30, 2025, 8:27pm

If you’re transforming the file that much, stash can’t keep track of it either since it’s not integrated into your file workflow. Either write you own integration for stash into the workflow or you’ll have tp accept your manual intervention in the workflow

olddude · October 1, 2025, 3:21pm

You are making your life way too hard. I do not have complex flows like yours. I do have scripted flows. The difference is I account for the tools when I build the flow. You need to understand the tools you are using to build a scalable flow or its going to be a lot of work.

For example, using avif with stash is a bad idea. The total savings will never be worth the headache. You need to make tradeoffs. If you are stubborn then it will involve a LOT more work … which you should embrace.

Topia · October 3, 2025, 2:59am

For example, using avif with stash is a bad idea. The total savings will never be worth the headache. You need to make tradeoffs. If you are stubborn then it will involve a LOT more work … which you should embrace.

I actually like putting some work into it. With Bash scripting, I even managed to get FileFlows to convert to AVIF, but I couldn’t get the workflow to work exactly as I wanted. Also, FileFlows takes about 10 minutes after a start to be ready to process files, which is frustrating to me and for some reason I couldn’t get it to use multi-threading to convert video files. I’m very satisfied with my current self-made workflow. It works perfectly for me.

olddude · October 3, 2025, 3:50pm

I do a lot of work to avoid some flows. The entire idea of renaming files is one of them. I believe files should be properly named on download. I use the metadata to download, name and even inventory files so I don’t have dupes.

This involves scraping networks/sites and maintaining my own database of metadata and assets. My flow involves me building something like TPDB or StashDB as my backend to handle tasks. So yes .. I embrace that work.

One place it makes sense to avoid work is on encoding video. I was using scripts and Handbrake and then HHBatchbeast but tdarr is much better at all of this with distributed and multithreaded video encoding. I only have 3 Intel Alchemist GPU to handle encoding. The GPU is eventually the bottleneck in my flow … but I have fiber and automated mass downloading.

I have heard good things about file flows but I handle all of that up front in my queue so its all taken care of by the time encoding happens. I have all of the metadata before I download files so inventory, filing and naming is all automated ahead in the queue.

olddude · October 3, 2025, 3:55pm

This is a good start. Many people use a staging directory. I use staging schema in my database for import but that is a database load issue in SQL.

I found that all of that work, specifically the detailed naming and renaming is not needed. I just put the files in their final location with name on download. I have 2 ways to check inventory. Check if the file already exists at the final location and then don’t download it or check my database to see if the scene_id already exists. All of my files contain an id from the studio which links back to the metadata.

This can all be scripted more efficiently … but its work to set it up

The end result is that you don’t download files you already have in your collection. You don’t need to download them to a staging area. Naming is simple since it just uses the studio metadata to create any name or filepath based on the metadata.

Naming and filing again if needed is simple as well since you know all of the metadata for the file. You should be able to check your files against all of the available files in a studio and generate a list of missing scenes. Inventory is part of the process … but I have a metadata database to compare against.

The 2 hardest parts of tagging are performers and original dates. Dates may be harder than performers! For performers I had to build my own performer database. A good flow is hard and performers involved a lot of scraping, database and $$ for compute on AWS where I keep the performer face vectors