This is a pure-python Tabix index parser. Useful as an alternative to PySAM and PyTabix for rapid read access by position to Tabix indexed block gzipped files such as VCFs and other common bioinfomatics formats.
See https://samtools.github.io/hts-specs/tabix.pdf and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042176 for information about Tabix and the detailed file format specification.
frompuretabiximportTabixIndexedFiletabix_indexed_file=TabixIndexedFile.from_files(open('somefile.vcf.gz', 'rb'), open('somefile.vcf.gz.tbi', 'rb')) tabix_indexed_file.fetch("1", 1000, 5000)Documentation is supported via Python built-in module PyDoc: python3 -m pydoc -b puretabix
Included in this package is tooling for reading and writing VCF lines.
To read a file:
frompuretabix.vcfimportread_vcf_lineswithopen("source.vcf") asinput: forvcflineinread_vcf_lines(input): ifvcfline.is_comment: # its a comment or meta-informationpasselse: # access the parsed informationif"PASS"notinvcfline._filter: print(f"{vcfline.chrom}{vcfline.pos}{vcfline.get_genotype()}")To write some lines:
frompuretabix.vcfimportVCFLinewithopen("output.vcf") asoutput: output.write(str(VCFLine.as_comment_key_dict("fileformat", "VCFv4.2"))) output.write("\n") output.write( str( VCFLine.as_comment_raw( "\t".join( ( "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", "SAMPLE", ) ) ) ) ) output.write("\n") output.write( str( VCFLine.as_data( "chr1", 123, ("rs123",), "A", ("C",), ".", ("PASS",),{}, ({"GT": "1/0"},), ) ) ) output.write("\n")If there is a tabix index for a block gzipped VCF file, that index can be used for fast random access
importpuretabixwithopen("input.vcf.gz", "rb") asvcf: withopen("input.vcf.gz.tbi", "rb") asvcf_tbi: indexed=puretabix.TabixIndexedVCFFile.from_files(vcf, vcf_tbi) vcfline=tuple(indexed.fetch_vcf_lines("chr1", 1108138)) assertvcfline.chrom=="chr1"assertvcfline.pos==1108138print(f"gt = {vcfline.get_genotype()}")TL;DR: pip install -e '.[dev]' && pre-commit install
pip install -e '.[dev]'# Install using pip including development extras pre-commit install # Enable pre-commit hooks pre-commit run --all-files # Run pre-commit hooks without committing# Note pre-commit is configured to use:# - seed-isort-config to better categorise third party imports# - isort to sort imports# - black to format code pip-compile # Freeze dependencies pytest # Run tests coverage run --source=puretabix -m pytest && coverage report -m # Run tests, print coverage mypy .# Type checking pipdeptree # Print dependencies scalene --outfile tests/perf_test.txt --profile-all --cpu-sampling-rate 0.0001 tests/perf_test.py # performance measurementsGlobal git ignores per https://help.github.com/en/github/using-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
For release to PyPI see https://packaging.python.org/tutorials/packaging-projects/
For information about packaging wheels see https://realpython.com/python-wheels/
git checkout master git pull git add setup.py CHANGES.txt git commit -m"prepare for x.x.x" git push git tag x.x.x git push origin x.x.x python3 setup.py sdist bdist_wheel && python3 -m twine upload dist/*Inspired by @yangmqglobe code in cggh/scikit-allel#297