Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
Convert FASTQ to FASTA:
seqtk seq -a in.fq.gz > out.faConvert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to
N(the 2nd):seqtk seq -aQ64 -q20 in.fq > out.fa seqtk seq -aQ64 -q20 -n N in.fq > out.faFold long FASTA/Q lines and remove FASTA/Q comments:
seqtk seq -Cl60 in.fa > out.faConvert multi-line FASTQ to 4-line FASTQ:
seqtk seq -l0 in.fq > out.fqReverse complement FASTA/Q:
seqtk seq -r in.fq > out.fqExtract sequences with names in file
name.lst, one sequence name per line:seqtk subseq in.fq name.lst > out.fqExtract sequences in regions contained in file
reg.bed:seqtk subseq in.fa reg.bed > out.faMask regions in
reg.bedto lowercases:seqtk seq -M reg.bed in.fa > out.faSubsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):
seqtk sample -s100 read1.fq 10000 > sub1.fq seqtk sample -s100 read2.fq 10000 > sub2.fqTrim low-quality bases from both ends using the Phred algorithm:
seqtk trimfq in.fq > out.fqTrim 5bp from the left end of each read and 10bp from the right end:
seqtk trimfq -b 5 -e 10 in.fa > out.fa