AI-Powered Repository Documentation Generation โข Multi-Language Support โข Architecture-Aware Analysis
Generate holistic, structured documentation for large-scale codebases โข Cross-module interactions โข Visual artifacts and diagrams
Quick Start โข CLI Commands โข Output Structure โข Paper
# Install from source pip install git+https://github.com/FSoft-AI4Code/CodeWiki.git # Verify installation codewiki --versionCodeWiki supports multiple models via an OpenAI-compatible SDK layer.
codewiki config set \ --api-key YOUR_API_KEY \ --base-url https://api.anthropic.com \ --main-model claude-sonnet-4 \ --cluster-model claude-sonnet-4# Navigate to your projectcd /path/to/your/project # Generate documentation codewiki generate # Generate with HTML viewer for GitHub Pages codewiki generate --github-pages --create-branchThat's it! Your documentation will be generated in ./docs/ with comprehensive repository-level analysis.
CodeWiki is an open-source framework for automated repository-level documentation across seven programming languages. It generates holistic, architecture-aware documentation that captures not only individual functions but also their cross-file, cross-module, and system-level interactions.
| Innovation | Description | Impact |
|---|---|---|
| Hierarchical Decomposition | Dynamic programming-inspired strategy that preserves architectural context | Handles codebases of arbitrary size (86K-1.4M LOC tested) |
| Recursive Agentic System | Adaptive multi-agent processing with dynamic delegation capabilities | Maintains quality while scaling to repository-level scope |
| Multi-Modal Synthesis | Generates textual documentation, architecture diagrams, data flows, and sequence diagrams | Comprehensive understanding from multiple perspectives |
๐ Python โข โ Java โข ๐จ JavaScript โข ๐ท TypeScript โข โ๏ธ C โข ๐ง C++ โข ๐ช C#
# Set up your API configuration codewiki config set \ --api-key <your-api-key> \ --base-url <provider-url> \ --main-model <model-name> \ --cluster-model <model-name># Show current configuration codewiki config show # Validate your configuration codewiki config validate# Basic generation codewiki generate # Custom output directory codewiki generate --output ./documentation # Create git branch for documentation codewiki generate --create-branch # Generate HTML viewer for GitHub Pages codewiki generate --github-pages # Enable verbose logging codewiki generate --verbose # Full-featured generation codewiki generate --create-branch --github-pages --verbose- API keys: Securely stored in system keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
- Settings:
~/.codewiki/config.json
Generated documentation includes both textual descriptions and visual artifacts for comprehensive understanding.
- Repository overview with architecture guide
- Module-level documentation with API references
- Usage examples and implementation patterns
- Cross-module interaction analysis
- System architecture diagrams (Mermaid)
- Data flow visualizations
- Dependency graphs and module relationships
- Sequence diagrams for complex interactions
./docs/ โโโ overview.md # Repository overview (start here!) โโโ module1.md # Module documentation โโโ module2.md # Additional modules... โโโ module_tree.json # Hierarchical module structure โโโ first_module_tree.json # Initial clustering result โโโ metadata.json # Generation metadata โโโ index.html # Interactive viewer (with --github-pages) CodeWiki has been evaluated on CodeWikiBench, the first benchmark specifically designed for repository-level documentation quality assessment.
| Language Category | CodeWiki (Sonnet-4) | DeepWiki | Improvement |
|---|---|---|---|
| High-Level (Python, JS, TS) | 79.14% | 68.67% | +10.47% |
| Managed (C#, Java) | 68.84% | 64.80% | +4.04% |
| Systems (C, C++) | 53.24% | 56.39% | -3.15% |
| Overall Average | 68.79% | 64.06% | +4.73% |
| Repository | Language | LOC | CodeWiki-Sonnet-4 | DeepWiki | Improvement |
|---|---|---|---|---|---|
| All-Hands-AI--OpenHands | Python | 229K | 82.45% | 73.04% | +9.41% |
| puppeteer--puppeteer | TypeScript | 136K | 83.00% | 64.46% | +18.54% |
| sveltejs--svelte | JavaScript | 125K | 71.96% | 68.51% | +3.45% |
| Unity-Technologies--ml-agents | C# | 86K | 79.78% | 74.80% | +4.98% |
| elastic--logstash | Java | 117K | 57.90% | 54.80% | +3.10% |
View comprehensive results: See paper for complete evaluation on 21 repositories spanning all supported languages.
CodeWiki employs a three-stage process for comprehensive documentation generation:
Hierarchical Decomposition: Uses dynamic programming-inspired algorithms to partition repositories into coherent modules while preserving architectural context across multiple granularity levels.
Recursive Multi-Agent Processing: Implements adaptive multi-agent processing with dynamic task delegation, allowing the system to handle complex modules at scale while maintaining quality.
Multi-Modal Synthesis: Integrates textual descriptions with visual artifacts including architecture diagrams, data-flow representations, and sequence diagrams for comprehensive understanding.
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ Codebase โโโโโถโ Hierarchical โโโโโถโ Multi-Agent โ โ Analysis โ โ Decomposition โ โ Processing โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ โผ โผ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ Visual โโโโโโ Multi-Modal โโโโโโ Structured โ โ Artifacts โ โ Synthesis โ โ Content โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ - Python 3.12+
- Node.js (for Mermaid diagram validation)
- LLM API access (Anthropic Claude, OpenAI, etc.)
- Git (for branch creation features)
- Docker Deployment - Containerized deployment instructions
- Development Guide - Project structure, architecture, and contributing guidelines
- CodeWikiBench - Repository-level documentation benchmark
- Live Demo - Interactive demo and examples
- Paper - Full research paper with detailed methodology and results
- Citation - How to cite CodeWiki in your research
If you use CodeWiki in your research, please cite:
@misc{hoang2025codewikievaluatingaisability, title={CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases}, author={Anh Nguyen Hoang and Minh Le-Anh and Bach Le and Nghi D. Q. Bui}, year={2025}, eprint={2510.24428}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2510.24428}, }This project is licensed under the MIT License.

