chonkie

chonkie

TLDR: Chonkie is a lightweight and fast RAG chunking library with various chunkers. It offers features like minimal default installs and supports multiple tokenizers. It has better size and speed compared to alternatives.

2024-11-01 Github

browser-use

browser-use

TLDR: The browser-use repository provides an easy way to connect AI agents with the browser. It offers features like vision and html extraction, multi-tab management, custom actions, and parallelization of agents. It also collects anonymous usage data for improvement.

2024-10-31 Github

open-computer-use

open-computer-use

TLDR: A secure cloud Linux computer powered by E2B Desktop Sandbox and controlled by open-source LLMs. Supports various LLMs like Meta Llama and OS-Atlas. Operates via keyboard, mouse and shell commands. Easily add new LLMs adhering to OpenAI API specification.

2024-10-31 Github

BetterWhisperX

BetterWhisperX

TLDR: A fork of WhisperX with improvements. Provides fast automatic speech recognition with word-level timestamps and speaker diarization. Includes features like batched inference, accurate timestamps using wav2vec2 alignment, and VAD preprocessing.

Python
2024-10-23 Github

computer_use_ootb

computer_use_ootb

TLDR: Computer Use OOTB is an out-of-the-box solution for Desktop GUI Agent, providing both API-based and locally-running models. It supports Windows and macOS, has no Docker requirement, and offers a user-friendly Gradio interface. It has had major updates, including local run functionality, added examples, support for multiple displays, and more. Users need to install prerequisites, clone the repository, install dependencies, and set API keys to start the interface for remote control. It also has advanced settings for the ShowUI model and a roadmap for further improvement.

Python
2024-10-23 Github

text-extract-api

text-extract-api

TLDR: A tool for converting images, PDFs, and Office documents to Markdown or JSON with high accuracy. Built with FastAPI, uses Celery for asynchronous tasks and Redis for caching. Supports various OCR strategies and can remove PII. Comes with a CLI tool and has different storage strategies. Also has an online demo and dedicated API clients.

2024-10-23 Github

Sana

3,200
@NVlabs

Sana

TLDR: Sana is a text-to-image framework that can efficiently generate high-resolution images up to 4096×4096 resolution. It features designs like DC-AE, Linear DiT, decoder-only text encoder, and efficient training and sampling. Sana is competitive with giant diffusion models, being smaller and faster while deployable on laptop GPU.

Python
2024-10-11 Github

F5-TTS

9,200
@SWivid

F5-TTS

TLDR: F5-TTS is a text-to-speech repository that features Diffusion Transformer with ConvNeXt V2 for faster training and inference. It includes various installation methods, inference options such as Gradio App and CLI, and training with a Gradio web interface. It also has an evaluation section and acknowledges multiple works. The code is released under MIT License while pre-trained models are under CC-BY-NC license.

Python
2024-10-08 Github

cookiecutter-uv

cookiecutter-uv

TLDR: A modern cookiecutter template for Python projects that use uv for dependency management

Python
2024-09-02 Github

Qwen2-VL

4,300
@QwenLM

Qwen2-VL

TLDR: Qwen2-VL is a vision language model with enhancements such as understanding images and videos of various resolutions and ratios, including support for multilingual texts in images. It offers open-sourced models under different licenses and provides various usage examples and benchmarks. Additionally, it supports quantization methods and has limitations which are areas for further improvement. The repository also provides deployment options and a web UI demo.

Python
2024-08-29 Github