web-ui

web-ui

TLDR: This project builds on browser-use, offers a user-friendly WebUI with support for various LLMs, allows custom browser usage and persistent browser sessions. It has installation options like local and Docker, and provides different themes and settings. Changelog shows updates like adding DeepSeek-r1 support, Docker setup and keeping browser open between tasks.

Python
2025-01-02 Github

youtube

youtube

TLDR: This repository contains scripts for various YouTube video processing tasks such as audio to text conversion, audio to subtitle conversion, video resolution conversion for YouTube Shorts, subtitle text processing, and video splitting into short clips.

Python
2024-12-29 Github

AI-reads-books-page-by-page

AI-reads-books-page-by-page

TLDR: This repository contains a script that performs page-by-page analysis of PDF books, extracting knowledge points and generating summaries. It offers features like automated analysis, AI-powered content understanding, interval summaries, and customizable options. The script can be set up by cloning the repository, installing requirements, and configuring constants. It works by setting up directories, loading an existing knowledge base, processing pages, generating summaries, and saving the results.

Python
2024-12-28 Github

deepseek-engineer

deepseek-engineer

TLDR: A coding assistant application that integrates with DeepSeek API. It can process user conversations, generate JSON responses, read local files, create new files, and apply diff edits. It has features like DeepSeek client configuration, data models, helper functions, and an interactive session.

Python
2024-12-26 Github

Aria-UI

Aria-UI

TLDR: Aria-UI is a model that handles diverse grounding instructions for GUI, is context-aware, lightweight and fast, and achieves superior performances on agent benchmarks. It can be installed and used with vllm or Transformers.

Python
2024-12-25 Github

pasa

pasa

TLDR: This repo introduces PaSa, an LLM-powered paper search agent. It can make autonomous decisions for complex scholarly queries. Optimized with reinforcement learning and synthetic data, PaSa outperforms baselines. It has two agents, Crawler and Selector, and uses two datasets. Instructions for quick start, running locally, and training are provided.

research Python
2024-12-23 Github

geminiCoder

geminiCoder

TLDR: A project that generates small apps with one prompt powered by the Gemini API. It uses technologies like Gemini API, Sandpack, Next.js app router with Tailwind. Can be cloned and run locally.

TypeScript
2024-12-20 Github

GraphAgent

GraphAgent

TLDR: GraphAgent is an automated agent pipeline for predictive and generative tasks. It consists of three key components: Graph Generator Agent, Task Planning Agent, and Task Execution Agent. It can handle real-world data with both structured and unstructured formats and has been demonstrated effective through extensive experiments. The repository also provides installation and inference instructions, along with benchmarks and citation information.

2024-12-18 Github

openai-structured-outputs-samples

openai-structured-outputs-samples

TLDR: A repository of sample apps demonstrating the use of OpenAI's Structured Outputs feature with NextJS.

TypeScript
2024-12-16 Github

ai-gradio

1,000
@AK391

ai-gradio

TLDR: A Python package that enables developers to create machine learning apps powered by various AI models like OpenAI, Gemini, Anthropic's Claude, LumaAI, CrewAI, XAI's Grok, and more. It supports features such as text chat, voice chat (OpenAI only), video chat (Gemini only), text generation with different models, AI video and image generation with LumaAI, AI agent teams with CrewAI, and more.

Python
2024-12-14 Github