📢: For convenience, we build a bi-directional LLMs toolkit BiLLM for language understanding. Welcome to use it.
Our implementation currently supports the following sequence classification benchmarks:
- SST2 (2 classes) / SST5 (5 classes)
- AGNews (4 classes)
- Twitter Financial News Sentiment (twitterfin, 3 classes)
and token classification benchmarks for named entity recognition (NER): CoNLL2003 and OntonotesV5.
Commands for training LS-LLaMA and LS-unLLaMA on different tasks can follow the templates below:
foo@bar:~$ CUDA_VISIBLE_DEVICES=0 python file_name.py dataset_name model_sizefile_name.py can be one of unllama_seq_clf.py, unllama_token_clf.py, llama_seq_clf.py, and llama_token_clf.py, for training LS-LLaMA and LS-unLLaMA on sequence- and token-level classification.
dataset_name can be one of sst2, sst5, agnews, twitterfin, conll03, and ontonotesv5.
model_size can be 7b or 13b, corresponding to LLaMA-2-7B and LLaMA-2-13B.
For example, the following command will train LS-unLLaMA based on LLaMA-2-7B on AGNews for sequence classification:
foo@bar:~$ CUDA_VISIBLE_DEVICES=0 python unllama_seq_clf.py agnews 7bLoad Pretrained Models
fromtransformersimportAutoTokenizerfrommodeling_llamaimport ( LlamaForSequenceClassification, LlamaForTokenClassification, UnmaskingLlamaForSequenceClassification, UnmaskingLlamaForTokenClassification, ) model_id='meta-llama/Llama-2-7b'tokenizer=AutoTokenizer.from_pretrained(model_id) model=LlamaForSequenceClassification.from_pretrained(model_id).bfloat16() model=LlamaForTokenClassification.from_pretrained(model_id).bfloat16() model=UnmaskingLlamaForSequenceClassification.from_pretrained(model_id).bfloat16() model=UnmaskingLlamaForTokenClassification.from_pretrained(model_id).bfloat16()For more usage, please refer to unllama_seq_clf.py, unllama_token_clf.py, llama_seq_clf.py, llama_token_clf.py.
@article{li2023label, title={Label supervised llama finetuning}, author={Li, Zongxi and Li, Xianming and Liu, Yuzhang and Xie, Haoran and Li, Jing and Wang, Fu-lee and Li, Qing and Zhong, Xiaoqin}, journal={arXiv preprint arXiv:2310.01208}, year={2023} } 