billy-enrizky/openbrowser-ai

Python185141 issues6 contributorsMIT

Summary

OpenBrowser is a Python framework for AI-driven browser automation that uses Chrome DevTools Protocol (CDP) directly instead of Playwright/Selenium, combined with a LangGraph-based CodeAgent where the LLM writes and executes Python code in a persistent namespace to navigate and interact with web pages. It supports 12+ LLM providers, an MCP server for Claude Desktop integration, and a CLI daemon mode. The architecture centers on an event bus (bubus) coordinating watchdogs that handle specific browser behaviors.

Great for

building LLM-powered web automation pipelines that need fine-grained browser control without Playwright overhead — scraping, form filling, flight booking bots, or any task requiring an agent that can reason about and interact with live web pages

Easy wins

+Add CONTRIBUTING.md with setup instructions and architecture overview — the watchdog pattern and event bus hierarchy are not documented anywhere in the repo
+The README model table lists models like 'gpt-5.2', 'claude-sonnet-4-6', 'gemini-3-flash' that don't exist yet — update to real current model names (gpt-4o, claude-3-5-sonnet, gemini-1.5-flash)
+Add 'good first issue' labels: the test files mock everything but there are no integration smoke tests — a simple real-browser 'navigate and screenshot' test would be valuable
+The docker-compose.yml references a frontend/ directory and Dockerfile that don't appear in the file tree — either add the frontend or remove those references to avoid confusion for new contributors

Red flags

!litellm pinned to exact version `==1.80.0` in pyproject.toml will cause dependency resolution conflicts for any project that uses a different litellm version
!README model table lists non-existent models (gpt-5.2, claude-sonnet-4-6, gemini-3-flash, claude-opus-4-6) — this is either aspirational documentation or fabricated, which erodes trust
!docker-compose.yml references a frontend service with its own Dockerfile but no frontend/ directory exists in the file tree — the compose file will fail to build
!Only 1 commit recorded despite 185 stars and apparent maturity — suggests history was squashed or the repo was migrated, making it impossible to understand the project's evolution
!posthog telemetry is a hard dependency (not optional) in the base install, meaning every user sends analytics by default — no opt-out documentation visible in README

Code quality

decent

The core session.py and watchdog code is architecturally thoughtful — the CDPSession/BrowserSession split is clean, the event bus pattern is consistently applied, and error handling uses specific exception types (BrowserError, URLNotAllowedError). However, there are rough spots: default_action_watchdog.py has a bare `except Exception as e: raise` pattern repeated multiple times (lines at end of on_ClickElementEvent, on_TypeTextEvent, on_ScrollEvent) which adds no value and clutters the code. The test suite is extensive but almost entirely unit tests with heavy mocking — the tests in test_agent_service_coverage.py and test_browser_session_coverage.py mock out every external dependency, which means real integration behavior is untested. The litellm dependency is hard-pinned to `==1.80.0` in pyproject.toml, which will cause dependency conflicts for users.

What makes it unique

The direct CDP approach (via cdp-use library) instead of Playwright is the genuine differentiator — most browser-use frameworks (browser-use, playwright-mcp, etc.) sit on top of Playwright. The CodeAgent architecture where the LLM writes executable Python rather than calling predefined tool functions is also distinct from action-space approaches. However, this occupies the same space as browser-use (which it explicitly benchmarks against) and Skyvern, and the benchmark comparisons in the repo appear to favor itself without independent verification.

Scores

Collab

Activity

Barrier to entry

high

The architecture is non-trivial — a multi-layer event bus + watchdog system with CDP sessions, Pydantic models with private attrs and complex init merging, and LangGraph agent loops — with zero contributing guide, only 1 commit recorded (suggesting a migrated or squashed history), and 0 good-first-issue labels despite the codebase being several thousand lines.

Skills needed

Python async/await (asyncio throughout)Chrome DevTools Protocol (CDP) internalsPydantic v2 models (used heavily for all data structures)LangGraph / LangChain patternsEvent-driven architecture (bubus EventBus, watchdog pattern)Browser automation concepts (DOM, XPath, selectors)LLM API integration (OpenAI, Anthropic, Google, etc.)