Latest Open Source Projects
logocreator
logocreator
TLDR: An open source logo generator that creates professional logos in seconds using customizable styles. It uses Flux Pro 1.1 on Together AI for logo generation, Next.js with TypeScript for the app framework, Shadcn and Tailwind for UI components and styling, Upstash Redis for rate limiting, Clerk for authentication, and Plausible & Helicone for analytics and observability. Future tasks include creating a dashboard with logo history, supporting SVG exports, adding more styles, adding an image size dropdown, showing approximate price with a custom Together AI key, allowing reference logo upload, and redesigning popular brand logos in a showcase.
chonkie
chonkie
TLDR: Chonkie is a lightweight and fast RAG chunking library with various chunkers. It offers features like minimal default installs and supports multiple tokenizers. It has better size and speed compared to alternatives.
Roo-Cline
Roo-Cline
TLDR: Roo-Cline is a fork of Cline, an autonomous coding agent. It comes with additional experimental features such as drag and drop images, sound effects, language selection, and support for various models. It provides capabilities like creating and editing files, running commands in the terminal, using the browser, and adding custom tools through the Model Context Protocol.
browser-use
browser-use
TLDR: The browser-use repository provides an easy way to connect AI agents with the browser. It offers features like vision and html extraction, multi-tab management, custom actions, and parallelization of agents. It also collects anonymous usage data for improvement.
open-computer-use
open-computer-use
TLDR: A secure cloud Linux computer powered by E2B Desktop Sandbox and controlled by open-source LLMs. Supports various LLMs like Meta Llama and OS-Atlas. Operates via keyboard, mouse and shell commands. Easily add new LLMs adhering to OpenAI API specification.
BetterWhisperX
BetterWhisperX
TLDR: A fork of WhisperX with improvements. Provides fast automatic speech recognition with word-level timestamps and speaker diarization. Includes features like batched inference, accurate timestamps using wav2vec2 alignment, and VAD preprocessing.
computer_use_ootb
computer_use_ootb
TLDR: Computer Use OOTB is an out-of-the-box solution for Desktop GUI Agent, providing both API-based and locally-running models. It supports Windows and macOS, has no Docker requirement, and offers a user-friendly Gradio interface. It has had major updates, including local run functionality, added examples, support for multiple displays, and more. Users need to install prerequisites, clone the repository, install dependencies, and set API keys to start the interface for remote control. It also has advanced settings for the ShowUI model and a roadmap for further improvement.
text-extract-api
text-extract-api
TLDR: A tool for converting images, PDFs, and Office documents to Markdown or JSON with high accuracy. Built with FastAPI, uses Celery for asynchronous tasks and Redis for caching. Supports various OCR strategies and can remove PII. Comes with a CLI tool and has different storage strategies. Also has an online demo and dedicated API clients.
ai-engineering-hub
Sana
Sana
TLDR: Sana is a text-to-image framework that can efficiently generate high-resolution images up to 4096×4096 resolution. It features designs like DC-AE, Linear DiT, decoder-only text encoder, and efficient training and sampling. Sana is competitive with giant diffusion models, being smaller and faster while deployable on laptop GPU.