F5-TTS

9,200
@SWivid

F5-TTS

TLDR: F5-TTS is a text-to-speech repository that features Diffusion Transformer with ConvNeXt V2 for faster training and inference. It includes various installation methods, inference options such as Gradio App and CLI, and training with a Gradio web interface. It also has an evaluation section and acknowledges multiple works. The code is released under MIT License while pre-trained models are under CC-BY-NC license.

Python
2024-10-08 Github

newsnow

newsnow

TLDR: An elegant news reading application that provides a pleasant reading experience with features like Github login and data synchronization. Supports deployment on Cloudflare Pages, Vercel and Docker.

elegant news TypeScript
2024-09-23 Github

shortest

shortest

TLDR: An AI-powered natural language end-to-end testing framework built on Playwright with features like Anthropic Claude API integration, GitHub 2FA support, and email validation. Also includes guides for web app and CLI development.

TypeScript
2024-09-18 Github

cookiecutter-uv

cookiecutter-uv

TLDR: A modern cookiecutter template for Python projects that use uv for dependency management

Python
2024-09-02 Github

llm_engineering

llm_engineering

TLDR: Repo to accompany my mastering LLM engineering course

2024-08-31 Github

Qwen2-VL

4,300
@QwenLM

Qwen2-VL

TLDR: Qwen2-VL is a vision language model with enhancements such as understanding images and videos of various resolutions and ratios, including support for multilingual texts in images. It offers open-sourced models under different licenses and provides various usage examples and benchmarks. Additionally, it supports quantization methods and has limitations which are areas for further improvement. The repository also provides deployment options and a web UI demo.

Python
2024-08-29 Github

potpie

potpie

TLDR: Prompt-To-Agent : Create custom engineering agents for your codebase

Python
2024-08-12 Github

VITA

VITA

TLDR: VITA-1.5 is an open-source interactive multimodal LLM. It features reduced interaction latency, enhanced multimodal performance, improved speech processing, and a progressive training strategy. It outperforms on various benchmarks and can be trained and used for inference.

2024-08-10 Github

story-adapter

story-adapter

TLDR: This repository contains the official implementation of 'Story-Adapter', a training-free and computationally efficient framework for long story visualization. It uses an iterative paradigm and a global reference cross-attention module to enhance the generative capability of long stories.

2024-08-10 Github

multi-agent-orchestrator

multi-agent-orchestrator

TLDR: The multi-agent-orchestrator is an open-source framework for orchestrating multiple AI agents to handle complex conversations. It features intelligent intent classification, dual language support, flexible agent responses, context management, extensible architecture, and universal deployment. It comes with pre-built agents and classifiers and offers a variety of examples and quick start guides.

2024-07-23 Github