A StashApp plugin for Vision-Language Model (VLM) based content tagging and analysis. This plugin is designed with a local-first philosophy, empowering users to run analysis on their own hardware (using CPU or GPU) and their local network. It also supports cloud-based VLM endpoints for additional flexibility. The Haven VLM Engine provides advanced automatic content detection and tagging, delivering superior accuracy compared to traditional image classification methods.
Features
Local Network Empowerment: Distribute processing across home/office computers without cloud dependencies
Context-Aware Detection: Leverages Vision-Language Modelsβ understanding of visual relationships
Advanced Dependency Management: Uses PythonDepManager for automatic dependency installation
Search for or download a vision-capable model; tested with : (in order of high to low accuracy) zai-org/glm-4.6v-flash, huihui-mistral-small-3.2-24b-instruct-2506-abliterated-v2, qwen/qwen3-vl-8b, lfm2.5-vl
Load your desired Model
On the developer tab start the local server using the start toggle
Optionally click the Settings gear then toggle Serve on local network
Optionally configure haven_vlm_config.py:
By default locahost is included in the config, remove cloud endpoint if you donβt want automatic failover
{
"base_url": "http://localhost:1234/v1", # LM Studio default
"api_key": "", # API key not required
"name": "lm-studio-local",
"weight": 5,
"is_fallback": False
}
@HavenCTO you mentioned some higher end phones being able to act as a local source for the LLM. I assume it would be significantly slower though. Do you have a guide for this as Iβve failed trying to set my Android up.
Here is a step-by-step guide, that Iβve put together based on my own testing, on how to run the LLM models on an Android device using Termux and Ollama. This method allows your phone to act as a local AI server accessible by other devices on your network.
TLDR - download termux copy then paste this single line bash command into the terminal
Device: A high-end Android phone (at least 8GB RAM recommended for better performance).
Performance : Accurate benchmarking still needs to be done, but this project is intended to allow for multiple devices, more devices, faster processing
App: Download Termux from the Google Play Store.
Note: While Termux is on the Play Store, for the absolute latest packages, users often install it from F-Droid.
Internet Connection: Required to download the model files.
Storage: Ensure you have enough space (approx. 4GB+ for the model weights).
Step 1: Install Termux and Update Packages
Open the Termux app you just downloaded from the Play Store.
Type the following command to update your package list and upgrade existing packages:
pkg update && pkg upgrade
Press Enter to confirm the updates.
Step 2: Install Required Tools
You need to install wget (for downloading files), curl (for testing the server), and ollama (the AI runtime).
Run the following command to install them:
pkg install wget curl proot
Wait for the installation to complete.
Step 3: Create Directories and Download the Model
We will create the necessary folders and download the specific GLM-4.6V-Flash model files.
Run this command to create the models and tmp directories and navigate to models:
mkdir -p ~/models ~/tmp && cd ~/models
Download the main model file (GLM-4.6V-Flash-UD-IQ2_M.gguf):
You may not need the mmproj file for the GLM models, but due to time constratints opted to test with the mmproj file.
I would recommend the latest Qwen35, however it reasons too much and as a result is very slow. Support for disabling reasoning in ollamaand llama.cpp is pending