Pure Tabix

This is a pure-python Tabix index parser. Useful as an alternative to PySAM and PyTabix for rapid read access by position to Tabix indexed block gzipped files such as VCFs and other common bioinfomatics formats.

See https://samtools.github.io/hts-specs/tabix.pdf and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042176 for information about Tabix and the detailed file format specification.

frompuretabiximportTabixIndexedFiletabix_indexed_file=TabixIndexedFile.from_files(open('somefile.vcf.gz', 'rb'), open('somefile.vcf.gz.tbi', 'rb')) tabix_indexed_file.fetch("1", 1000, 5000)

Documentation is supported via Python built-in module PyDoc: python3 -m pydoc -b puretabix

VCF

Included in this package is tooling for reading and writing VCF lines.

To read a file:

frompuretabix.vcfimportread_vcf_lineswithopen("source.vcf") asinput: forvcflineinread_vcf_lines(input): ifvcfline.is_comment: # its a comment or meta-informationpasselse: # access the parsed informationif"PASS"notinvcfline._filter: print(f"{vcfline.chrom}{vcfline.pos}{vcfline.get_genotype()}")

To write some lines:

frompuretabix.vcfimportVCFLinewithopen("output.vcf") asoutput: output.write(str(VCFLine.as_comment_key_dict("fileformat", "VCFv4.2"))) output.write("\n") output.write( str( VCFLine.as_comment_raw( "\t".join( ( "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", "SAMPLE", ) ) ) ) ) output.write("\n") output.write( str( VCFLine.as_data( "chr1", 123, ("rs123",), "A", ("C",), ".", ("PASS",),{}, ({"GT": "1/0"},), ) ) ) output.write("\n")

VCF with index

If there is a tabix index for a block gzipped VCF file, that index can be used for fast random access

importpuretabixwithopen("input.vcf.gz", "rb") asvcf: withopen("input.vcf.gz.tbi", "rb") asvcf_tbi: indexed=puretabix.TabixIndexedVCFFile.from_files(vcf, vcf_tbi) vcfline=tuple(indexed.fetch_vcf_lines("chr1", 1108138)) assertvcfline.chrom=="chr1"assertvcfline.pos==1108138print(f"gt = {vcfline.get_genotype()}")

development

TL;DR: pip install -e '.[dev]' && pre-commit install

pip install -e '.[dev]'# Install using pip including development extras pre-commit install # Enable pre-commit hooks pre-commit run --all-files # Run pre-commit hooks without committing# Note pre-commit is configured to use:# - seed-isort-config to better categorise third party imports# - isort to sort imports# - black to format code pip-compile # Freeze dependencies pytest # Run tests coverage run --source=puretabix -m pytest && coverage report -m # Run tests, print coverage mypy .# Type checking pipdeptree # Print dependencies scalene --outfile tests/perf_test.txt --profile-all --cpu-sampling-rate 0.0001 tests/perf_test.py # performance measurements

Global git ignores per https://help.github.com/en/github/using-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer

For release to PyPI see https://packaging.python.org/tutorials/packaging-projects/

For information about packaging wheels see https://realpython.com/python-wheels/

git checkout master git pull git add setup.py CHANGES.txt git commit -m"prepare for x.x.x" git push git tag x.x.x git push origin x.x.x python3 setup.py sdist bdist_wheel && python3 -m twine upload dist/*

acknowledgements

Inspired by @yangmqglobe code in cggh/scikit-allel#297

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.circleci		.circleci
puretabix		puretabix
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
CHANGES.txt		CHANGES.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pure Tabix

VCF

VCF with index

development

acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sanogenetics/puretabix

Folders and files

Latest commit

History

Repository files navigation

Pure Tabix

VCF

VCF with index

development

acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages