Latest Open Source Projects
web-ui
web-ui
TLDR: This project builds on browser-use, offers a user-friendly WebUI with support for various LLMs, allows custom browser usage and persistent browser sessions. It has installation options like local and Docker, and provides different themes and settings. Changelog shows updates like adding DeepSeek-r1 support, Docker setup and keeping browser open between tasks.
youtube
youtube
TLDR: This repository contains scripts for various YouTube video processing tasks such as audio to text conversion, audio to subtitle conversion, video resolution conversion for YouTube Shorts, subtitle text processing, and video splitting into short clips.
AI-reads-books-page-by-page
AI-reads-books-page-by-page
TLDR: This repository contains a script that performs page-by-page analysis of PDF books, extracting knowledge points and generating summaries. It offers features like automated analysis, AI-powered content understanding, interval summaries, and customizable options. The script can be set up by cloning the repository, installing requirements, and configuring constants. It works by setting up directories, loading an existing knowledge base, processing pages, generating summaries, and saving the results.
deepseek-engineer
deepseek-engineer
TLDR: A coding assistant application that integrates with DeepSeek API. It can process user conversations, generate JSON responses, read local files, create new files, and apply diff edits. It has features like DeepSeek client configuration, data models, helper functions, and an interactive session.
Aria-UI
Aria-UI
TLDR: Aria-UI is a model that handles diverse grounding instructions for GUI, is context-aware, lightweight and fast, and achieves superior performances on agent benchmarks. It can be installed and used with vllm or Transformers.
pasa
pasa
TLDR: This repo introduces PaSa, an LLM-powered paper search agent. It can make autonomous decisions for complex scholarly queries. Optimized with reinforcement learning and synthetic data, PaSa outperforms baselines. It has two agents, Crawler and Selector, and uses two datasets. Instructions for quick start, running locally, and training are provided.
geminiCoder
geminiCoder
TLDR: A project that generates small apps with one prompt powered by the Gemini API. It uses technologies like Gemini API, Sandpack, Next.js app router with Tailwind. Can be cloned and run locally.
GraphAgent
GraphAgent
TLDR: GraphAgent is an automated agent pipeline for predictive and generative tasks. It consists of three key components: Graph Generator Agent, Task Planning Agent, and Task Execution Agent. It can handle real-world data with both structured and unstructured formats and has been demonstrated effective through extensive experiments. The repository also provides installation and inference instructions, along with benchmarks and citation information.
openai-structured-outputs-samples
openai-structured-outputs-samples
TLDR: A repository of sample apps demonstrating the use of OpenAI's Structured Outputs feature with NextJS.
ai-gradio
ai-gradio
TLDR: A Python package that enables developers to create machine learning apps powered by various AI models like OpenAI, Gemini, Anthropic's Claude, LumaAI, CrewAI, XAI's Grok, and more. It supports features such as text chat, voice chat (OpenAI only), video chat (Gemini only), text generation with different models, AI video and image generation with LumaAI, AI agent teams with CrewAI, and more.