Skip to content

malusamayo/underspec-analysis

Repository files navigation

What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts

This is a reproduction repository for analyzing the impacts of under-specification on LLM behaviors.

Refer to the full experiment setup in the paper.

Data available

We share all experiment configurations in data/configs, all prompts in data/prompts, all curated requirements in data/requirements, and the evaluation results here.

Steps to reproduce the analysis

Download evaluation data from here. Create three repositores data/results/commitpack, data/results/trip, data/results/product, and uncompress the evaluation results into each repository.

Run steps in analysis-reproduction.ipynb.

Steps to reproduce the full experiments

First, add your OpenAI key for running the OpenAI models, and Bedrock key for running the Llama3 models.

Experiment 3.2 / 3.3

uv run python3 run.py --config=data/configs/commitpack_main.yaml uv run python3 run.py --config=data/configs/trip_main.yaml uv run python3 run.py --config=data/configs/product_main.yaml 

Experiment 3.5

uv run python3 run.py --config=data/configs/commitpack_fix.yaml uv run python3 run.py --config=data/configs/trip_fix.yaml uv run python3 run.py --config=data/configs/product_fix.yaml 

Experiment 4.1 / 4.2

To rerun prompt optimization, use

uv run python3 -m analysis.optimize --config=data/configs/commitpack_optimizer_gen.yaml uv run python3 -m analysis.optimize --config=data/configs/trip_optimizer_gen.yaml uv run python3 -m analysis.optimize --config=data/configs/product_optimizer_gen.yaml 

To reused the optimized prompts, use

uv run python3 run.py --config=data/configs/commitpack_prioritize.yaml uv run python3 run.py --config=data/configs/trip_prioritize.yaml uv run python3 run.py --config=data/configs/product_prioritize.yaml 

Other utilities provided in this repository

To generate new requirements, use

uv run python3 -m analysis.elicitation 

To generate new evaluators, use

uv run python3 -m analysis.judge 

To generate new prompts, use

uv run python3 -m analysis.prompt_gen 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published