Let LLMs interact with websites through a simple interface.
pip install browser-usefromlangchain_openaiimportChatOpenAIfrombrowser_useimportAgentagent=Agent( task="Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour.", llm=ChatOpenAI(model="gpt-4o"), ) # ... inside an async functionawaitagent.run()Prompt: Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. (1x speed)
Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)
- Create a virtual environment and install dependencies:
# I recommend using uv pip install .- Add your API keys to the
.envfile:
cp .env.example .envE.g. for OpenAI:
OPENAI_API_KEY=You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.
- Universal LLM Support - Works with any Language Model
- Interactive Element Detection - Automatically finds interactive elements
- Multi-Tab Management - Seamless handling of browser tabs
- XPath Extraction for scraping functions - No more manual DevTools inspection
- Vision Model Support - Process visual page information
- Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
- Handles dynamic content - dont worry about cookies or changing content
- Chain-of-thought prompting with memory - Solve long-term tasks
- Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions
You can persist the browser across multiple agents and chain them together.
fromasyncioimportrunfrombrowser_useimportAgent, Controllerfromdotenvimportload_dotenvfromlangchain_anthropicimportChatAnthropicload_dotenv() # Persist browser state across agentscontroller=Controller() # Initialize browser agentagent1=Agent( task="Open 3 VCs websites in the New York area.", llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None), controller=controller) agent2=Agent( task="Give me the names of the founders of the companies in all tabs.", llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None), controller=controller) run(agent1.run()) founders, history=run(agent2.run()) print(founders)You can use the history to run the agents again deterministically.
Run examples directly from the command line (clone the repo first):
python examples/try.py "Your query here" --provider [openai|anthropic]You need to add ANTHROPIC_API_KEY to your environment variables. Example usage:
python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropicYou need to add OPENAI_API_KEY to your environment variables. Example usage:
python examples/try.py "Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropicAll LangChain chat models are supported. Tested with:
- GPT-4o
- GPT-4o Mini
- Claude 3.5 Sonnet
- LLama 3.1 405B
- When extracting page content, the message length increases and the LLM gets slower.
- Currently one agent costs about 0.01$
- Sometimes it tries to repeat the same task over and over again.
- Some elements might not be extracted which you want to interact with.
- What should we focus on the most?
- Robustness
- Speed
- Cost reduction
- Save agent actions and execute them deterministically
- Pydantic forced output
- Third party SERP API for faster Google Search results
- Multi-step action execution to increase speed
- Test on mind2web dataset
- Add more browser actions
Contributions are welcome! Feel free to open issues for bugs or feature requests.
Feel free to join the Discord for discussions and support.
Made with ❤️ by the Browser-Use team

