Latest Open Source Projects
F5-TTS
F5-TTS
TLDR: F5-TTS is a text-to-speech repository that features Diffusion Transformer with ConvNeXt V2 for faster training and inference. It includes various installation methods, inference options such as Gradio App and CLI, and training with a Gradio web interface. It also has an evaluation section and acknowledges multiple works. The code is released under MIT License while pre-trained models are under CC-BY-NC license.
newsnow
newsnow
TLDR: An elegant news reading application that provides a pleasant reading experience with features like Github login and data synchronization. Supports deployment on Cloudflare Pages, Vercel and Docker.
shortest
shortest
TLDR: An AI-powered natural language end-to-end testing framework built on Playwright with features like Anthropic Claude API integration, GitHub 2FA support, and email validation. Also includes guides for web app and CLI development.
cookiecutter-uv
cookiecutter-uv
TLDR: A modern cookiecutter template for Python projects that use uv for dependency management
llm_engineering
Qwen2-VL
Qwen2-VL
TLDR: Qwen2-VL is a vision language model with enhancements such as understanding images and videos of various resolutions and ratios, including support for multilingual texts in images. It offers open-sourced models under different licenses and provides various usage examples and benchmarks. Additionally, it supports quantization methods and has limitations which are areas for further improvement. The repository also provides deployment options and a web UI demo.
potpie
VITA
VITA
TLDR: VITA-1.5 is an open-source interactive multimodal LLM. It features reduced interaction latency, enhanced multimodal performance, improved speech processing, and a progressive training strategy. It outperforms on various benchmarks and can be trained and used for inference.
story-adapter
story-adapter
TLDR: This repository contains the official implementation of 'Story-Adapter', a training-free and computationally efficient framework for long story visualization. It uses an iterative paradigm and a global reference cross-attention module to enhance the generative capability of long stories.
multi-agent-orchestrator
multi-agent-orchestrator
TLDR: The multi-agent-orchestrator is an open-source framework for orchestrating multiple AI agents to handle complex conversations. It features intelligent intent classification, dual language support, flexible agent responses, context management, extensible architecture, and universal deployment. It comes with pre-built agents and classifiers and offers a variety of examples and quick start guides.