Predict stock prices with AI - Simple, Research-Focused, Extensible
StocketAI helps you predict stock price movements using machine learning. It can forecast whether stocks will go up or down in 1, 3, or 6 months. The system is designed for researchers and analysts who want to experiment with different data sources and prediction models.
Key Benefits:
- ๐ฏ Research-First: Built for experimentation and scientific validation
- ๐ง Extensible: Easy to add new data sources or prediction models
- ๐ Multi-Source: Works with multiple Vietnamese financial data providers
- ๐งช Reproducible: Consistent results across different runs
As a solution architect without finance expertise or Python development background, I want to build an AI model for each company from the VN30 list to predict stock prices in 1, 3, and 6-month horizons with low risk using all available data. This project leverages vnstock for comprehensive Vietnamese market data acquisition and qlib for quantitative finance modeling to create a research-focused prediction system that balances technical sophistication with practical usability.
- Forecast price changes for 1, 3, or 6 months ahead
- Get confidence scores for each prediction
- Generate buy/hold/sell signals
- Test predictions against historical data
- Measure accuracy with standard finance metrics
- Simulate portfolio performance with trading costs
- Try different machine learning models
- Compare prediction strategies
- Add new data sources or features
Core Components:
- Python 3.12+ - Modern, reliable programming language
- vnstock - Vietnamese market data (prices, financials, news)
- qlib - Advanced financial modeling toolkit
Machine Learning:
- PyTorch/TensorFlow - Deep learning frameworks
- LightGBM/XGBoost - Fast, accurate tree-based models
- scikit-learn - Traditional ML algorithms
Data & Visualization:
- pandas/numpy - Data manipulation
- matplotlib/plotly - Charts and interactive dashboards
StocketAI/ โโโ data/ โ โโโ symbols/ # Individual stock symbol organization โ โ โโโ{symbol}/ # Each symbol as independent data unit โ โ โโโ raw/ # Raw data from vnstock APIs โ โ โโโ processed/ # Cleaned and validated data โ โ โโโ qlib_format/ # Qlib .bin format data โ โ โโโ progress/ # Processing progress and status โ โ โโโ reports/ # Analysis reports and metrics โ โ โโโ errors/ # Error logs and debugging info โ โโโ reports/ # Summary and results โโโ src/ โ โโโ data_acquisition/ # vnstock integration modules โ โโโ data_processing/ # Data cleaning and validation โ โโโ feature_engineering/ # Feature generation and qlib integration โ โโโ model_training/ # Model training and optimization โ โโโ prediction/ # Inference and signal generation โ โโโ evaluation/ # Backtesting and performance analysis โ โโโ reporting/ # Report generation and visualization โโโ notebooks/ # Jupyter notebooks for research โโโ tests/ # Unit and integration tests โโโ config/ # Configuration files and parameters โโโ docs/ # Documentation and guides - Windows 11 with developer tools enabled
- Conda (mandatory - venv/virtualenv not permitted)
- Git for Windows with proper line ending configuration
# Create conda environment conda create -n StocketAI python=3.12-y conda activate StocketAI # Install core packages conda install pip pandas numpy scipy matplotlib seaborn plotly -y conda install scikit-learn lightgbm xgboost -y conda install pytorch torchvision torchaudio cpuonly -c pytorch -y conda install tensorflow -c conda-forge -y # Install development tools conda install jupyter jupyterlab pytest flake8 black mypy -y pip install pre-commitgit clone https://github.com/thinh-vu/vnstock.git cd vnstock pip install -e .git clone https://github.com/microsoft/qlib.git cd qlib pip install -e .Clone the repository
git clone <repository-url>cd StocketAI
Set up the environment
conda activate StocketAI pip install -r requirements.txtConfigure environment variables
$env:PYTHONPATH="$PWD/src;$PWD"$env:QLIB_DATA="$PWD/data/qlib_format"
Run initial data acquisition
jupyter notebook notebooks/vn30/01_load_vn30_constituents.ipynb
The project provides Jupyter notebooks for different use cases:
notebooks/vn30/- VN30 specific workflowsnotebooks/common/- Provider-agnostic operationsnotebooks/[provider_name]/- Other provider-specific notebooks
Each notebook contains complete, production-ready workflows for data acquisition, processing, model training, and evaluation.
- PEP 8 compliance with 88-character line limit
- Type hints for all functions and methods
- Google-style docstrings for public APIs
- Grouped imports with proper ordering
- Unit tests for individual functions with edge cases
- Integration tests for component interactions
- 90%+ code coverage requirement
- Focus on business logic, not external API testing
- Code quality: passes flake8, black, mypy
- Functionality: meets all specified requirements
- Testing: comprehensive test suite
- Documentation: complete and accurate
- Follow the established coding standards and architecture principles
- Create comprehensive unit tests for new functionality
- Update documentation for any API changes
- Ensure all quality gates pass before submitting
GPLv3