Skip to content

Midscene Python 是一个基于AI的UI自动化测试框架,支持Web和Android平台。其核心创新在于使用自然语言驱动自动化任务,通过集成视觉语言模型(如GPT-4V、Qwen2.5-VL、Gemini)来理解用户意图并执行操作.

License

Notifications You must be signed in to change notification settings

Python51888/Midscene-Python

Repository files navigation

Midscene Python zread

English | 简体中文

Midscene Python is an AI-based automation framework that supports UI automation operations on Web and Android platforms.

Overview

Midscene Python provides comprehensive UI automation capabilities with the following core features:

  • Natural Language Driven: Describe automation tasks using natural language
  • Multi-platform Support: Supports Web (Selenium/Playwright) and Android (ADB)
  • AI Model Integration: Supports multiple vision-language models such as GPT-4V, Qwen2.5-VL, and Gemini
  • Visual Debugging: Provides detailed execution reports and debugging information
  • Caching Mechanism: Intelligent caching to improve execution efficiency

Project Architecture

midscene-python/ ├── midscene/ # Core framework │ ├── core/ # Core framework │ │ ├── agent/ # Agent system │ │ ├── insight/ # AI inference engine │ │ ├── ai_model/ # AI model integration │ │ ├── yaml/ # YAML script executor │ │ └── types.py # Core type definitions │ ├── web/ # Web integration │ │ ├── selenium/ # Selenium integration │ │ ├── playwright/ # Playwright integration │ │ └── bridge/ # Bridge mode │ ├── android/ # Android integration │ │ ├── device.py # Device management │ │ └── agent.py # Android Agent │ ├── cli/ # Command line tools │ ├── mcp/ # MCP protocol support │ ├── shared/ # Shared utilities │ └── visualizer/ # Visual reports ├── examples/ # Example code ├── tests/ # Test cases └── docs/ # Documentation 

Tech Stack

  • Python 3.9+: Core runtime environment
  • Pydantic: Data validation and serialization
  • Selenium/Playwright: Web automation
  • OpenCV/Pillow: Image processing
  • HTTPX/AIOHTTP: HTTP client
  • Typer: CLI framework
  • Loguru: Logging

Quick Start

Installation

pip install midscene-python

Basic Usage

frommidsceneimportAgentfrommidscene.webimportSeleniumWebPage# Create a Web AgentwithSeleniumWebPage.create() aspage: agent=Agent(page) # Perform automation operations using natural languageawaitagent.ai_action("Click the login button") awaitagent.ai_action("Enter username '[email protected]'") awaitagent.ai_action("Enter password 'password123'") awaitagent.ai_action("Click the submit button") # Data extractionuser_info=awaitagent.ai_extract("Extract user personal information") # Assertion verificationawaitagent.ai_assert("Page displays welcome message")

Key Features

🤖 AI-Driven Automation

Describe operations using natural language, and AI automatically understands and executes:

awaitagent.ai_action("Enter 'Python tutorial' in the search box and search")

🔍 Intelligent Element Location

Supports multiple location strategies and automatically selects the optimal solution:

element=awaitagent.ai_locate("Login button")

📊 Data Extraction

Extract structured data from the page:

products=awaitagent.ai_extract({"products": [{"name": "Product Name", "price": "Price", "rating": "Rating"} ] })

✅ Intelligent Assertions

AI understands page state and performs intelligent assertions:

awaitagent.ai_assert("User has successfully logged in")

📝 Credits

Thanks to Midscene Project: https://github.com/web-infra-dev/midscene for inspiration and technical references

License

MIT License

About

Midscene Python 是一个基于AI的UI自动化测试框架,支持Web和Android平台。其核心创新在于使用自然语言驱动自动化任务,通过集成视觉语言模型(如GPT-4V、Qwen2.5-VL、Gemini)来理解用户意图并执行操作.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published