Dev Tools for Computer Vision
NEW AND HOT 🔥 VLMTrainingKit
ML Tools
Use this kit to fine-tune open-source multimodal LLMs for computer vision tasks. It includes presets for popular models like PaliGemma and LLaVA-NeXT. The kit supports both video and image inputs and can train models for object detection, segmentation, and QA tasks.
DataBuilder
Data Processing Tools
Using this tool you can parse and automatically annotate data for training and evaluating LLMs for computer vision tasks.
Gemamba 2B (Base)
Video Large Language Models
SOTA for Video QA in Under 7B class.
The first model in the world combined a Mamba-based video encoder with an LLM. It's also the smallest model in our lineup.
BenchmarkingInference
ML Tools
This repository contains an inference engine designed to quickly and efficiently run video-based LLM benchmarks. The engine leverages parallelism to maximize resource usage and minimize compute time.
VideoEmbeddings-General
Embedding Model
Soon on GitHub!
This embedding model turns videos into semantic vectors. You can use it to add videos to your RAG.
VideoBenchmark-CCTV
Benchmark
Soon on GitHub!
This benchmark evaluates if the model is capable of solving tasks using video footages from CCTV cameras as an input. It delivers scores for logging events involving people, objects, and environment.
VLMDeploymentKit
Benchmark
Soon on GitHub!
Deploy your fine-tuned multimodal LLM on edge devices or private clouds. Perfect for orchestrating different adapters and using RAGs.
Computer Vision à la Apple Intelligence: Building Multimodal Adapters for On-Device LLMs
Andrew Buzin
Aug 12, 2024
Gemamba: Can Mamba Beat a Transformer In a Multimodal LLM? [Updated]
Andrey Buzin
May 3, 2024
“The Tortoise Lays on Its Back, but You are Not Helping” - Navigating the Complexities of Emotional Data in AI
Nick Sheero
Feb 7, 2024
"What We Do in the Pixels" - TensorSense Research Card
Mark Ayzenshtadt
Feb 7, 2024