Midscene Python is an AI-based automation framework that supports UI automation operations on Web and Android platforms.
Midscene Python provides comprehensive UI automation capabilities with the following core features:
- Natural Language Driven: Describe automation tasks using natural language
- Multi-platform Support: Supports Web (Selenium/Playwright) and Android (ADB)
- AI Model Integration: Supports multiple vision-language models such as GPT-4V, Qwen2.5-VL, and Gemini
- Visual Debugging: Provides detailed execution reports and debugging information
- Caching Mechanism: Intelligent caching to improve execution efficiency
midscene-python/ ├── midscene/ # Core framework │ ├── core/ # Core framework │ │ ├── agent/ # Agent system │ │ ├── insight/ # AI inference engine │ │ ├── ai_model/ # AI model integration │ │ ├── yaml/ # YAML script executor │ │ └── types.py # Core type definitions │ ├── web/ # Web integration │ │ ├── selenium/ # Selenium integration │ │ ├── playwright/ # Playwright integration │ │ └── bridge/ # Bridge mode │ ├── android/ # Android integration │ │ ├── device.py # Device management │ │ └── agent.py # Android Agent │ ├── cli/ # Command line tools │ ├── mcp/ # MCP protocol support │ ├── shared/ # Shared utilities │ └── visualizer/ # Visual reports ├── examples/ # Example code ├── tests/ # Test cases └── docs/ # Documentation - Python 3.9+: Core runtime environment
- Pydantic: Data validation and serialization
- Selenium/Playwright: Web automation
- OpenCV/Pillow: Image processing
- HTTPX/AIOHTTP: HTTP client
- Typer: CLI framework
- Loguru: Logging
pip install midscene-pythonfrommidsceneimportAgentfrommidscene.webimportSeleniumWebPage# Create a Web AgentwithSeleniumWebPage.create() aspage: agent=Agent(page) # Perform automation operations using natural languageawaitagent.ai_action("Click the login button") awaitagent.ai_action("Enter username '[email protected]'") awaitagent.ai_action("Enter password 'password123'") awaitagent.ai_action("Click the submit button") # Data extractionuser_info=awaitagent.ai_extract("Extract user personal information") # Assertion verificationawaitagent.ai_assert("Page displays welcome message")Describe operations using natural language, and AI automatically understands and executes:
awaitagent.ai_action("Enter 'Python tutorial' in the search box and search")Supports multiple location strategies and automatically selects the optimal solution:
element=awaitagent.ai_locate("Login button")Extract structured data from the page:
products=awaitagent.ai_extract({"products": [{"name": "Product Name", "price": "Price", "rating": "Rating"} ] })AI understands page state and performs intelligent assertions:
awaitagent.ai_assert("User has successfully logged in")Thanks to Midscene Project: https://github.com/web-infra-dev/midscene for inspiration and technical references
MIT License