-
-[](https://pypi.python.org/pypi/codext/)
-[](https://python-codext.readthedocs.io/en/latest/?badge=latest)
-[](https://github.com/dhondta/python-codext/actions/workflows/python-package.yml)
-[](#)
-[](https://pypi.python.org/pypi/codext/)
-[](https://snyk.io/test/github/dhondta/python-codext?targetFile=requirements.txt)
-[](https://zenodo.org/badge/latestdoi/236679865)
-[](https://pypi.python.org/pypi/codext/)
-
-[**CodExt**](https://github.com/dhondta/python-codext) is a (Python2-3 compatible) library that extends the native [`codecs`](https://docs.python.org/3/library/codecs.html) library (namely for adding new custom encodings and character mappings) and provides **120+ new codecs**, hence its name combining *CODecs EXTension*. It also features a **guess mode** for decoding multiple layers of encoding and **CLI tools** for convenience.
-
-```sh
-$ pip install codext
-```
-
-Want to contribute a new codec ? | Want to contribute a new macro ?
-:----------------------------------:|:------------------------------------:
-Check the [documentation](https://python-codext.readthedocs.io/en/latest/howto.html) first Then [PR](https://github.com/dhondta/python-codext/pulls) your new codec | [PR](https://github.com/dhondta/python-codext/pulls) your updated version of [`macros.json`](https://github.com/dhondta/python-codext/blob/main/codext/macros.json)
-
-## :mag: Demonstrations
-
-
-
-
-
-## :computer: Usage (main CLI tool)
-
-```session
-$ codext -i test.txt encode dna-1
-GTGAGCGGGTATGTGA
-
-$ echo -en "test" | codext encode morse
-- . ... -
-
-$ echo -en "test" | codext encode braille
-⠞⠑⠎⠞
-
-$ echo -en "test" | codext encode base100
-👫👜👪👫
-```
-
-### Chaining codecs
-
-```sh
-$ echo -en "Test string" | codext encode reverse
-gnirts tseT
-
-$ echo -en "Test string" | codext encode reverse morse
---. -. .. .-. - ... / - ... . -
-
-$ echo -en "Test string" | codext encode reverse morse dna-2
-AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC
-
-$ echo -en "Test string" | codext encode reverse morse dna-2 octal
-101107124103101107124103101107124107101107101101101107124103101107124107101107101101101107124107101107124107101107101101101107124107101107124103101107124107101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124124101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124107101107101101101107124103
-
-$ echo -en "AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC" | codext -d dna-2 morse reverse
-test string
-```
-
-### Using macros
-
-```sh
-$ codext add-macro my-encoding-chain gzip base63 lzma base64
-
-$ codext list macros
-example-macro, my-encoding-chain
-
-$ echo -en "Test string" | codext encode my-encoding-chain
-CQQFAF0AAIAAABuTgySPa7WaZC5Sunt6FS0ko71BdrYE8zHqg91qaqadZIR2LafUzpeYDBalvE///ug4AA==
-
-$ codext remove-macro my-encoding-chain
-
-$ codext list macros
-example-macro
-```
-
-## :computer: Usage (base CLI tool)
-
-```session
-$ echo "Test string !" | base122
-*.7!ft9�-f9Â
-
-$ echo "Test string !" | base91
-"ONK;WDZM%Z%xE7L
-
-$ echo "Test string !" | base91 | base85
-B2P|BJ6A+nO(j|-cttl%
-
-$ echo "Test string !" | base91 | base85 | base36 | base58-flickr
-QVx5tvgjvCAkXaMSuKoQmCnjeCV1YyyR3WErUUErFf
-
-$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | base58-flickr -d | base36 -d | base85 -d | base91 -d
-Test string !
-```
-
-```session
-$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -m 3
-Test string !
-
-$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -f Test
-Test string !
-```
-
-## :computer: Usage (Python)
-
-Getting the list of available codecs:
-
-```python
->>> import codext
-
->>> codext.list()
-['ascii85', 'base85', 'base100', 'base122', ..., 'tomtom', 'dna', 'html', 'markdown', 'url', 'resistor', 'sms', 'whitespace', 'whitespace-after-before']
-
->>> codext.encode("this is a test", "base58-bitcoin")
-'jo91waLQA1NNeBmZKUF'
-
->>> codext.encode("this is a test", "base58-ripple")
-'jo9rA2LQwr44eBmZK7E'
-
->>> codext.encode("this is a test", "base58-url")
-'JN91Wzkpa1nnDbLyjtf'
-
->>> codecs.encode("this is a test", "base100")
-'👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫'
-
->>> codecs.decode("👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫", "base100")
-'this is a test'
-
->>> for i in range(8):
- print(codext.encode("this is a test", "dna-%d" % (i + 1)))
-GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA
-CTCACGGACGGCCTATAGAACGGCCTATAGAACGACAGAACTCACGCCCTATCTCA
-ACAGATTGATTAACGCGTGGATTAACGCGTGGATGAGTGGACAGATAAACGCACAG
-AGACATTCATTAAGCGCTCCATTAAGCGCTCCATCACTCCAGACATAAAGCGAGAC
-TCTGTAAGTAATTCGCGAGGTAATTCGCGAGGTAGTGAGGTCTGTATTTCGCTCTG
-TGTCTAACTAATTGCGCACCTAATTGCGCACCTACTCACCTGTCTATTTGCGTGTC
-GAGTGCCTGCCGGATATCTTGCCGGATATCTTGCTGTCTTGAGTGCGGGATAGAGT
-CACTCGGTCGGCCATATGTTCGGCCATATGTTCGTCTGTTCACTCGCCCATACACT
->>> codext.decode("GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA", "dna-1")
-'this is a test'
-
->>> codecs.encode("this is a test", "morse")
-'- .... .. ... / .. ... / .- / - . ... -'
-
->>> codecs.decode("- .... .. ... / .. ... / .- / - . ... -", "morse")
-'this is a test'
-
->>> with open("morse.txt", 'w', encoding="morse") as f:
- f.write("this is a test")
-14
-
->>> with open("morse.txt",encoding="morse") as f:
- f.read()
-'this is a test'
-
->>> codext.decode("""
- =
- X
- :
- x
- n
- r
- y
- Y
- y
- p
- a
- `
- n
- |
- a
-o
- h
- `
- g
- o
- z """, "whitespace-after+before")
-'CSC{not_so_invisible}'
-
->>> print(codext.encode("An example test string", "baudot-tape"))
-***.**
- . *
-***.*
-* .
- .*
-* .*
- . *
-** .*
-***.**
-** .**
- .*
-* .
-* *. *
- .*
-* *.
-* *. *
-* .
-* *.
-* *. *
-***.
- *.*
-***.*
- * .*
-```
-
-## :page_with_curl: List of codecs
-
-#### [BaseXX](https://python-codext.readthedocs.io/en/latest/enc/base.html)
-
-- [X] `base1`: useless, but for the sake of completeness
-- [X] `base2`: simple conversion to binary (with a variant with a reversed alphabet)
-- [X] `base3`: conversion to ternary (with a variant with a reversed alphabet)
-- [X] `base4`: conversion to quarternary (with a variant with a reversed alphabet)
-- [X] `base8`: simple conversion to octal (with a variant with a reversed alphabet)
-- [X] `base10`: simple conversion to decimal
-- [X] `base11`: conversion to digits with a "*a*"
-- [X] `base16`: simple conversion to hexadecimal (with a variant holding an alphabet with digits and letters inverted)
-- [X] `base26`: conversion to alphabet letters
-- [X] `base32`: classical conversion according to the RFC4648 with all its variants ([zbase32](https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt), extended hexadecimal, [geohash](https://en.wikipedia.org/wiki/Geohash), [Crockford](https://www.crockford.com/base32.html))
-- [X] `base36`: [Base36](https://en.wikipedia.org/wiki/Base36) conversion to letters and digits (with a variant inverting both groups)
-- [X] `base45`: [Base45](https://datatracker.ietf.org/doc/html/draft-faltstrom-base45-04.txt) DRAFT algorithm (with a variant inverting letters and digits)
-- [X] `base58`: multiple versions of [Base58](https://en.bitcoinwiki.org/wiki/Base58) (bitcoin, flickr, ripple)
-- [X] `base62`: [Base62](https://en.wikipedia.org/wiki/Base62) conversion to lower- and uppercase letters and digits (with a variant with letters and digits inverted)
-- [X] `base63`: similar to `base62` with the "`_`" added
-- [X] `base64`: classical conversion according to RFC4648 with its variant URL (or *file*) (it also holds a variant with letters and digits inverted)
-- [X] `base67`: custom conversion using some more special characters (also with a variant with letters and digits inverted)
-- [X] `base85`: all variants of Base85 ([Ascii85](https://fr.wikipedia.org/wiki/Ascii85), [z85](https://rfc.zeromq.org/spec/32), [Adobe](https://dencode.com/string/ascii85), [(x)btoa](https://dencode.com/string/ascii85), [RFC1924](https://datatracker.ietf.org/doc/html/rfc1924), [XML](https://datatracker.ietf.org/doc/html/draft-kwiatkowski-base85-for-xml-00))
-- [X] `base91`: [Base91](http://base91.sourceforge.net) custom conversion
-- [X] `base100` (or *emoji*): [Base100](https://github.com/AdamNiederer/base100) custom conversion
-- [X] `base122`: [Base100](http://blog.kevinalbs.com/base122) custom conversion
-- [X] `base-genericN`: see [base encodings](https://python-codext.readthedocs.io/en/latest/enc/base.html) ; supports any possible base
-
-This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `base85` codec.
-
-#### [Binary](https://python-codext.readthedocs.io/en/latest/enc/binary.html)
-
-- [X] `baudot`: supports CCITT-1, CCITT-2, EU/FR, ITA1, ITA2, MTK-2 (Python3 only), UK, ...
-- [X] `baudot-spaced`: variant of `baudot` ; groups of 5 bits are whitespace-separated
-- [X] `baudot-tape`: variant of `baudot` ; outputs a string that looks like a perforated tape
-- [X] `bcd`: _Binary Coded Decimal_, encodes characters from their (zero-left-padded) ordinals
-- [X] `bcd-extended0`: variant of `bcd` ; encodes characters from their (zero-left-padded) ordinals using prefix bits `0000`
-- [X] `bcd-extended1`: variant of `bcd` ; encodes characters from their (zero-left-padded) ordinals using prefix bits `1111`
-- [X] `excess3`: uses Excess-3 (aka Stibitz code) binary encoding to convert characters from their ordinals
-- [X] `gray`: aka reflected binary code
-- [X] `manchester`: XORes each bit of the input with `01`
-- [X] `manchester-inverted`: variant of `manchester` ; XORes each bit of the input with `10`
-- [X] `rotateN`: rotates characters by the specified number of bits (*N* belongs to [1, 7] ; Python 3 only)
-
-#### [Common](https://python-codext.readthedocs.io/en/latest/enc/common.html)
-
-- [X] `a1z26`: keeps words whitespace-separated and uses a custom character separator
-- [X] `cases`: set of case-related encodings (including camel-, kebab-, lower-, pascal-, upper-, snake- and swap-case, slugify, capitalize, title)
-- [X] `dummy`: set of simple encodings (including integer, replace, reverse, word-reverse, substite and strip-spaces)
-- [X] `octal`: dummy octal conversion (converts to 3-digits groups)
-- [X] `octal-spaced`: variant of `octal` ; dummy octal conversion, handling whitespace separators
-- [X] `ordinal`: dummy character ordinals conversion (converts to 3-digits groups)
-- [X] `ordinal-spaced`: variant of `ordinal` ; dummy character ordinals conversion, handling whitespace separators
-
-#### [Compression](https://python-codext.readthedocs.io/en/latest/enc/compressions.html)
-
-- [X] `gzip`: standard Gzip compression/decompression
-- [X] `lz77`: compresses the given data with the algorithm of Lempel and Ziv of 1977
-- [X] `lz78`: compresses the given data with the algorithm of Lempel and Ziv of 1978
-- [X] `pkzip_deflate`: standard Zip-deflate compression/decompression
-- [X] `pkzip_bzip2`: standard BZip2 compression/decompression
-- [X] `pkzip_lzma`: standard LZMA compression/decompression
-
-> :warning: Compression functions are of course definitely **NOT** encoding functions ; they are implemented for leveraging the `.encode(...)` API from `codecs`.
-
-#### [Cryptography](https://python-codext.readthedocs.io/en/latest/enc/crypto.html)
-
-- [X] `affine`: aka Affine Cipher
-- [X] `atbash`: aka Atbash Cipher
-- [X] `bacon`: aka Baconian Cipher
-- [X] `barbie-N`: aka Barbie Typewriter (*N* belongs to [1, 4])
-- [X] `citrix`: aka Citrix CTX1 password encoding
-- [X] `railfence`: aka Rail Fence Cipher
-- [X] `rotN`: aka Caesar cipher (*N* belongs to [1,25])
-- [X] `scytaleN`: encrypts using the number of letters on the rod (*N* belongs to [1,[)
-- [X] `shiftN`: shift ordinals (*N* belongs to [1,255])
-- [X] `xorN`: XOR with a single byte (*N* belongs to [1,255])
-
-> :warning: Crypto functions are of course definitely **NOT** encoding functions ; they are implemented for leveraging the `.encode(...)` API from `codecs`.
-
-#### [Hashing](https://python-codext.readthedocs.io/en/latest/enc/hashing.html)
-
-- [X] `blake`: includes BLAKE2b and BLAKE2s (Python 3 only ; relies on `hashlib`)
-- [X] `checksums`: includes Adler32 and CRC32 (relies on `zlib`)
-- [X] `crypt`: Unix's crypt hash for passwords (Python 3 and Unix only ; relies on `crypt`)
-- [X] `md`: aka Message Digest ; includes MD4 and MD5 (relies on `hashlib`)
-- [X] `sha`: aka Secure Hash Algorithms ; includes SHA1, 224, 256, 384, 512 (Python2/3) but also SHA3-224, -256, -384 and -512 (Python 3 only ; relies on `hashlib`)
-- [X] `shake`: aka SHAKE hashing (Python 3 only ; relies on `hashlib`)
-
-> :warning: Hash functions are of course definitely **NOT** encoding functions ; they are implemented for convenience with the `.encode(...)` API from `codecs` and useful for chaning codecs.
-
-#### [Languages](https://python-codext.readthedocs.io/en/latest/enc/languages.html)
-
-- [X] `braille`: well-known braille language (Python 3 only)
-- [X] `ipsum`: aka lorem ipsum
-- [X] `galactic`: aka galactic alphabet or Minecraft enchantment language (Python 3 only)
-- [X] `leetspeak`: based on minimalistic elite speaking rules
-- [X] `morse`: uses whitespace as a separator
-- [X] `navajo`: only handles letters (not full words from the Navajo dictionary)
-- [X] `radio`: aka NATO or radio phonetic alphabet
-- [X] `southpark`: converts letters to Kenny's language from Southpark (whitespace is also handled)
-- [X] `southpark-icase`: case insensitive variant of `southpark`
-- [X] `tap`: converts text to tap/knock code, commonly used by prisoners
-- [X] `tomtom`: similar to `morse`, using slashes and backslashes
-
-#### [Others](https://python-codext.readthedocs.io/en/latest/enc/others.html)
-
-- [X] `dna`: implements the 8 rules of DNA sequences (N belongs to [1,8])
-- [X] `letter-indices`: encodes consonants and/or vowels with their corresponding indices
-- [X] `markdown`: unidirectional encoding from Markdown to HTML
-
-#### [Steganography](https://python-codext.readthedocs.io/en/latest/enc/stegano.html)
-
-- [X] `hexagram`: uses Base64 and encodes the result to a charset of [I Ching hexagrams](https://en.wikipedia.org/wiki/Hexagram_%28I_Ching%29) (as implemented [here](https://github.com/qntm/hexagram-encode))
-- [X] `klopf`: aka Klopf code ; Polybius square with trivial alphabetical distribution
-- [X] `resistor`: aka resistor color codes
-- [X] `rick`: aka Rick cipher (in reference to Rick Astley's song "*Never gonna give you up*")
-- [X] `sms`: also called _T9 code_ ; uses "`-`" as a separator for encoding, "`-`" or "`_`" or whitespace for decoding
-- [X] `whitespace`: replaces bits with whitespaces and tabs
-- [X] `whitespace_after_before`: variant of `whitespace` ; encodes characters as new characters with whitespaces before and after according to an equation described in the codec name (e.g. "`whitespace+2*after-3*before`")
-
-#### [Web](https://python-codext.readthedocs.io/en/latest/enc/web.html)
-
-- [X] `html`: implements entities according to [this reference](https://dev.w3.org/html5/html-author/charref)
-- [X] `url`: aka URL encoding
-
-
-## :clap: Supporters
-
-[](https://github.com/dhondta/python-codext/stargazers)
-
-[](https://github.com/dhondta/python-codext/network/members)
-
-
+
+
CodExt
+
Encode/decode anything.
+
+[](https://pypi.python.org/pypi/codext/)
+[](https://python-codext.readthedocs.io/en/latest/?badge=latest)
+[](https://github.com/dhondta/python-codext/actions/workflows/python-package.yml)
+[](#)
+[](https://pypi.python.org/pypi/codext/)
+[](https://snyk.io/test/github/dhondta/python-codext?targetFile=requirements.txt)
+[](https://zenodo.org/badge/latestdoi/236679865)
+[](https://pypi.python.org/pypi/codext/)
+
+[**CodExt**](https://github.com/dhondta/python-codext) is a (Python2-3 compatible) library that extends the native [`codecs`](https://docs.python.org/3/library/codecs.html) library (namely for adding new custom encodings and character mappings) and provides **120+ new codecs**, hence its name combining *CODecs EXTension*. It also features a **guess mode** for decoding multiple layers of encoding and **CLI tools** for convenience.
+
+```sh
+$ pip install codext
+```
+
+Want to contribute a new codec ? | Want to contribute a new macro ?
+:----------------------------------:|:------------------------------------:
+Check the [documentation](https://python-codext.readthedocs.io/en/latest/howto.html) first Then [PR](https://github.com/dhondta/python-codext/pulls) your new codec | [PR](https://github.com/dhondta/python-codext/pulls) your updated version of [`macros.json`](https://github.com/dhondta/python-codext/blob/main/codext/macros.json)
+
+## :mag: Demonstrations
+
+
+
+
+
+## :computer: Usage (main CLI tool)
+
+```session
+$ codext -i test.txt encode dna-1
+GTGAGCGGGTATGTGA
+
+$ echo -en "test" | codext encode morse
+- . ... -
+
+$ echo -en "test" | codext encode braille
+⠞⠑⠎⠞
+
+$ echo -en "test" | codext encode base100
+👫👜👪👫
+```
+
+### Chaining codecs
+
+```sh
+$ echo -en "Test string" | codext encode reverse
+gnirts tseT
+
+$ echo -en "Test string" | codext encode reverse morse
+--. -. .. .-. - ... / - ... . -
+
+$ echo -en "Test string" | codext encode reverse morse dna-2
+AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC
+
+$ echo -en "Test string" | codext encode reverse morse dna-2 octal
+101107124103101107124103101107124107101107101101101107124103101107124107101107101101101107124107101107124107101107101101101107124107101107124103101107124107101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124124101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124107101107101101101107124103
+
+$ echo -en "AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC" | codext -d dna-2 morse reverse
+test string
+```
+
+### Using macros
+
+```sh
+$ codext add-macro my-encoding-chain gzip base63 lzma base64
+
+$ codext list macros
+example-macro, my-encoding-chain
+
+$ echo -en "Test string" | codext encode my-encoding-chain
+CQQFAF0AAIAAABuTgySPa7WaZC5Sunt6FS0ko71BdrYE8zHqg91qaqadZIR2LafUzpeYDBalvE///ug4AA==
+
+$ codext remove-macro my-encoding-chain
+
+$ codext list macros
+example-macro
+```
+
+## :computer: Usage (base CLI tool)
+
+```session
+$ echo "Test string !" | base122
+*.7!ft9�-f9Â
+
+$ echo "Test string !" | base91
+"ONK;WDZM%Z%xE7L
+
+$ echo "Test string !" | base91 | base85
+B2P|BJ6A+nO(j|-cttl%
+
+$ echo "Test string !" | base91 | base85 | base36 | base58-flickr
+QVx5tvgjvCAkXaMSuKoQmCnjeCV1YyyR3WErUUErFf
+
+$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | base58-flickr -d | base36 -d | base85 -d | base91 -d
+Test string !
+```
+
+```session
+$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -m 3
+Test string !
+
+$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -f Test
+Test string !
+```
+
+## :computer: Usage (Python)
+
+Getting the list of available codecs:
+
+```python
+>>> import codext
+
+>>> codext.list()
+['ascii85', 'base85', 'base100', 'base122', ..., 'tomtom', 'dna', 'html', 'markdown', 'url', 'resistor', 'sms', 'whitespace', 'whitespace-after-before']
+
+>>> codext.encode("this is a test", "base58-bitcoin")
+'jo91waLQA1NNeBmZKUF'
+
+>>> codext.encode("this is a test", "base58-ripple")
+'jo9rA2LQwr44eBmZK7E'
+
+>>> codext.encode("this is a test", "base58-url")
+'JN91Wzkpa1nnDbLyjtf'
+
+>>> codecs.encode("this is a test", "base100")
+'👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫'
+
+>>> codecs.decode("👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫", "base100")
+'this is a test'
+
+>>> for i in range(8):
+ print(codext.encode("this is a test", "dna-%d" % (i + 1)))
+GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA
+CTCACGGACGGCCTATAGAACGGCCTATAGAACGACAGAACTCACGCCCTATCTCA
+ACAGATTGATTAACGCGTGGATTAACGCGTGGATGAGTGGACAGATAAACGCACAG
+AGACATTCATTAAGCGCTCCATTAAGCGCTCCATCACTCCAGACATAAAGCGAGAC
+TCTGTAAGTAATTCGCGAGGTAATTCGCGAGGTAGTGAGGTCTGTATTTCGCTCTG
+TGTCTAACTAATTGCGCACCTAATTGCGCACCTACTCACCTGTCTATTTGCGTGTC
+GAGTGCCTGCCGGATATCTTGCCGGATATCTTGCTGTCTTGAGTGCGGGATAGAGT
+CACTCGGTCGGCCATATGTTCGGCCATATGTTCGTCTGTTCACTCGCCCATACACT
+>>> codext.decode("GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA", "dna-1")
+'this is a test'
+
+>>> codecs.encode("this is a test", "morse")
+'- .... .. ... / .. ... / .- / - . ... -'
+
+>>> codecs.decode("- .... .. ... / .. ... / .- / - . ... -", "morse")
+'this is a test'
+
+>>> with open("morse.txt", 'w', encoding="morse") as f:
+ f.write("this is a test")
+14
+
+>>> with open("morse.txt",encoding="morse") as f:
+ f.read()
+'this is a test'
+
+>>> codext.decode("""
+ =
+ X
+ :
+ x
+ n
+ r
+ y
+ Y
+ y
+ p
+ a
+ `
+ n
+ |
+ a
+o
+ h
+ `
+ g
+ o
+ z """, "whitespace-after+before")
+'CSC{not_so_invisible}'
+
+>>> print(codext.encode("An example test string", "baudot-tape"))
+***.**
+ . *
+***.*
+* .
+ .*
+* .*
+ . *
+** .*
+***.**
+** .**
+ .*
+* .
+* *. *
+ .*
+* *.
+* *. *
+* .
+* *.
+* *. *
+***.
+ *.*
+***.*
+ * .*
+```
+
+## :page_with_curl: List of codecs
+
+#### [BaseXX](https://python-codext.readthedocs.io/en/latest/enc/base.html)
+
+- [X] `base1`: useless, but for the sake of completeness
+- [X] `base2`: simple conversion to binary (with a variant with a reversed alphabet)
+- [X] `base3`: conversion to ternary (with a variant with a reversed alphabet)
+- [X] `base4`: conversion to quarternary (with a variant with a reversed alphabet)
+- [X] `base8`: simple conversion to octal (with a variant with a reversed alphabet)
+- [X] `base10`: simple conversion to decimal
+- [X] `base11`: conversion to digits with a "*a*"
+- [X] `base16`: simple conversion to hexadecimal (with a variant holding an alphabet with digits and letters inverted)
+- [X] `base26`: conversion to alphabet letters
+- [X] `base32`: classical conversion according to the RFC4648 with all its variants ([zbase32](https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt), extended hexadecimal, [geohash](https://en.wikipedia.org/wiki/Geohash), [Crockford](https://www.crockford.com/base32.html))
+- [X] `base36`: [Base36](https://en.wikipedia.org/wiki/Base36) conversion to letters and digits (with a variant inverting both groups)
+- [X] `base45`: [Base45](https://datatracker.ietf.org/doc/html/draft-faltstrom-base45-04.txt) DRAFT algorithm (with a variant inverting letters and digits)
+- [X] `base58`: multiple versions of [Base58](https://en.bitcoinwiki.org/wiki/Base58) (bitcoin, flickr, ripple)
+- [X] `base62`: [Base62](https://en.wikipedia.org/wiki/Base62) conversion to lower- and uppercase letters and digits (with a variant with letters and digits inverted)
+- [X] `base63`: similar to `base62` with the "`_`" added
+- [X] `base64`: classical conversion according to RFC4648 with its variant URL (or *file*) (it also holds a variant with letters and digits inverted)
+- [X] `base67`: custom conversion using some more special characters (also with a variant with letters and digits inverted)
+- [X] `base85`: all variants of Base85 ([Ascii85](https://fr.wikipedia.org/wiki/Ascii85), [z85](https://rfc.zeromq.org/spec/32), [Adobe](https://dencode.com/string/ascii85), [(x)btoa](https://dencode.com/string/ascii85), [RFC1924](https://datatracker.ietf.org/doc/html/rfc1924), [XML](https://datatracker.ietf.org/doc/html/draft-kwiatkowski-base85-for-xml-00))
+- [X] `base91`: [Base91](http://base91.sourceforge.net) custom conversion
+- [X] `base100` (or *emoji*): [Base100](https://github.com/AdamNiederer/base100) custom conversion
+- [X] `base122`: [Base100](http://blog.kevinalbs.com/base122) custom conversion
+- [X] `base-genericN`: see [base encodings](https://python-codext.readthedocs.io/en/latest/enc/base.html) ; supports any possible base
+
+This category also contains `ascii85`, `adobe`, `[x]btoa`, `zeromq` with the `base85` codec.
+
+#### [Binary](https://python-codext.readthedocs.io/en/latest/enc/binary.html)
+
+- [X] `baudot`: supports CCITT-1, CCITT-2, EU/FR, ITA1, ITA2, MTK-2 (Python3 only), UK, ...
+- [X] `baudot-spaced`: variant of `baudot` ; groups of 5 bits are whitespace-separated
+- [X] `baudot-tape`: variant of `baudot` ; outputs a string that looks like a perforated tape
+- [X] `bcd`: _Binary Coded Decimal_, encodes characters from their (zero-left-padded) ordinals
+- [X] `bcd-extended0`: variant of `bcd` ; encodes characters from their (zero-left-padded) ordinals using prefix bits `0000`
+- [X] `bcd-extended1`: variant of `bcd` ; encodes characters from their (zero-left-padded) ordinals using prefix bits `1111`
+- [X] `excess3`: uses Excess-3 (aka Stibitz code) binary encoding to convert characters from their ordinals
+- [X] `gray`: aka reflected binary code
+- [X] `manchester`: XORes each bit of the input with `01`
+- [X] `manchester-inverted`: variant of `manchester` ; XORes each bit of the input with `10`
+- [X] `rotateN`: rotates characters by the specified number of bits (*N* belongs to [1, 7] ; Python 3 only)
+
+#### [Common](https://python-codext.readthedocs.io/en/latest/enc/common.html)
+
+- [X] `a1z26`: keeps words whitespace-separated and uses a custom character separator
+- [X] `cases`: set of case-related encodings (including camel-, kebab-, lower-, pascal-, upper-, snake- and swap-case, slugify, capitalize, title)
+- [X] `dummy`: set of simple encodings (including integer, replace, reverse, word-reverse, substite and strip-spaces)
+- [X] `octal`: dummy octal conversion (converts to 3-digits groups)
+- [X] `octal-spaced`: variant of `octal` ; dummy octal conversion, handling whitespace separators
+- [X] `ordinal`: dummy character ordinals conversion (converts to 3-digits groups)
+- [X] `ordinal-spaced`: variant of `ordinal` ; dummy character ordinals conversion, handling whitespace separators
+
+#### [Compression](https://python-codext.readthedocs.io/en/latest/enc/compressions.html)
+
+- [X] `gzip`: standard Gzip compression/decompression
+- [X] `lz77`: compresses the given data with the algorithm of Lempel and Ziv of 1977
+- [X] `lz78`: compresses the given data with the algorithm of Lempel and Ziv of 1978
+- [X] `pkzip_deflate`: standard Zip-deflate compression/decompression
+- [X] `pkzip_bzip2`: standard BZip2 compression/decompression
+- [X] `pkzip_lzma`: standard LZMA compression/decompression
+
+> :warning: Compression functions are of course definitely **NOT** encoding functions ; they are implemented for leveraging the `.encode(...)` API from `codecs`.
+
+#### [Cryptography](https://python-codext.readthedocs.io/en/latest/enc/crypto.html)
+
+- [X] `affine`: aka Affine Cipher
+- [X] `atbash`: aka Atbash Cipher
+- [X] `bacon`: aka Baconian Cipher
+- [X] `barbie-N`: aka Barbie Typewriter (*N* belongs to [1, 4])
+- [X] `citrix`: aka Citrix CTX1 password encoding
+- [X] `railfence`: aka Rail Fence Cipher
+- [X] `rotN`: aka Caesar cipher (*N* belongs to [1,25])
+- [X] `scytaleN`: encrypts using the number of letters on the rod (*N* belongs to [1,[)
+- [X] `shiftN`: shift ordinals (*N* belongs to [1,255])
+- [X] `xorN`: XOR with a single byte (*N* belongs to [1,255])
+
+> :warning: Crypto functions are of course definitely **NOT** encoding functions ; they are implemented for leveraging the `.encode(...)` API from `codecs`.
+
+#### [Hashing](https://python-codext.readthedocs.io/en/latest/enc/hashing.html)
+
+- [X] `blake`: includes BLAKE2b and BLAKE2s (Python 3 only ; relies on `hashlib`)
+- [X] `checksums`: includes Adler32 and CRC32 (relies on `zlib`)
+- [X] `crypt`: Unix's crypt hash for passwords (Python 3 and Unix only ; relies on `crypt`)
+- [X] `md`: aka Message Digest ; includes MD4 and MD5 (relies on `hashlib`)
+- [X] `sha`: aka Secure Hash Algorithms ; includes SHA1, 224, 256, 384, 512 (Python2/3) but also SHA3-224, -256, -384 and -512 (Python 3 only ; relies on `hashlib`)
+- [X] `shake`: aka SHAKE hashing (Python 3 only ; relies on `hashlib`)
+
+> :warning: Hash functions are of course definitely **NOT** encoding functions ; they are implemented for convenience with the `.encode(...)` API from `codecs` and useful for chaning codecs.
+
+#### [Languages](https://python-codext.readthedocs.io/en/latest/enc/languages.html)
+
+- [X] `braille`: well-known braille language (Python 3 only)
+- [X] `ipsum`: aka lorem ipsum
+- [X] `galactic`: aka galactic alphabet or Minecraft enchantment language (Python 3 only)
+- [X] `leetspeak`: based on minimalistic elite speaking rules
+- [X] `morse`: uses whitespace as a separator
+- [X] `navajo`: only handles letters (not full words from the Navajo dictionary)
+- [X] `radio`: aka NATO or radio phonetic alphabet
+- [X] `southpark`: converts letters to Kenny's language from Southpark (whitespace is also handled)
+- [X] `southpark-icase`: case insensitive variant of `southpark`
+- [X] `tap`: converts text to tap/knock code, commonly used by prisoners
+- [X] `tomtom`: similar to `morse`, using slashes and backslashes
+
+#### [Others](https://python-codext.readthedocs.io/en/latest/enc/others.html)
+
+- [X] `dna`: implements the 8 rules of DNA sequences (N belongs to [1,8])
+- [X] `letter-indices`: encodes consonants and/or vowels with their corresponding indices
+- [X] `markdown`: unidirectional encoding from Markdown to HTML
+
+#### [Steganography](https://python-codext.readthedocs.io/en/latest/enc/stegano.html)
+
+- [X] `hexagram`: uses Base64 and encodes the result to a charset of [I Ching hexagrams](https://en.wikipedia.org/wiki/Hexagram_%28I_Ching%29) (as implemented [here](https://github.com/qntm/hexagram-encode))
+- [X] `klopf`: aka Klopf code ; Polybius square with trivial alphabetical distribution
+- [X] `resistor`: aka resistor color codes
+- [X] `rick`: aka Rick cipher (in reference to Rick Astley's song "*Never gonna give you up*")
+- [X] `sms`: also called _T9 code_ ; uses "`-`" as a separator for encoding, "`-`" or "`_`" or whitespace for decoding
+- [X] `whitespace`: replaces bits with whitespaces and tabs
+- [X] `whitespace_after_before`: variant of `whitespace` ; encodes characters as new characters with whitespaces before and after according to an equation described in the codec name (e.g. "`whitespace+2*after-3*before`")
+
+#### [Web](https://python-codext.readthedocs.io/en/latest/enc/web.html)
+
+- [X] `html`: implements entities according to [this reference](https://dev.w3.org/html5/html-author/charref)
+- [X] `url`: aka URL encoding
+
+
+## :clap: Supporters
+
+[](https://github.com/dhondta/python-codext/stargazers)
+
+[](https://github.com/dhondta/python-codext/network/members)
+
+
diff --git a/docs/coverage.svg b/docs/coverage.svg
index 1006657..efa3c52 100644
--- a/docs/coverage.svg
+++ b/docs/coverage.svg
@@ -1 +1 @@
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/pyproject.toml b/pyproject.toml
index 2323ece..1644aee 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,5 +1,5 @@
[build-system]
-requires = ["setuptools>=61.0", "setuptools-scm"]
+requires = ["setuptools>=80.0.0", "setuptools-scm"]
build-backend = "setuptools.build_meta"
[tool.setuptools.dynamic]
@@ -26,8 +26,8 @@ classifiers = [
"Topic :: Software Development :: Libraries :: Python Modules",
]
dependencies = [
- "crypt-r; python_version >= '3.13'",
- "markdown2>=2.4.0",
+ "legacycrypt; python_version >= '3.13'",
+ "markdown2>=2.5.4",
]
dynamic = ["version"]
diff --git a/pytest.ini b/pytest.ini
index ab4c198..fcccae1 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -1,2 +1,2 @@
[pytest]
-python_paths = src
+pythonpath = src
diff --git a/requirements.txt b/requirements.txt
index b5db972..dcaadfd 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1 +1,2 @@
-markdown2>=2.4.0
+legacycrypt; python_version >= '3.13'
+markdown2>=2.5.4
diff --git a/src/codext/VERSION.txt b/src/codext/VERSION.txt
index be2c181..e8018a2 100644
--- a/src/codext/VERSION.txt
+++ b/src/codext/VERSION.txt
@@ -1 +1 @@
-1.15.5
+1.15.10
diff --git a/src/codext/__common__.py b/src/codext/__common__.py
index 7ad45d9..861d342 100644
--- a/src/codext/__common__.py
+++ b/src/codext/__common__.py
@@ -6,7 +6,6 @@
import os
import random
import re
-import sre_parse
import sys
from encodings.aliases import aliases as ALIASES
from functools import reduce, update_wrapper, wraps
@@ -37,8 +36,12 @@
from importlib import reload
except ImportError:
pass
+try:
+ import re._parser as sre_parse
+except ImportError:
+ import sre_parse
-# from Python 3.11, it seems that 'sre_parse' is not bound to 're' anymore
+# from Python 3.11, 'sre_parse' is bound as '_parser' ; monkey-patch it for backward-compatibility
re.sre_parse = sre_parse
@@ -261,7 +264,7 @@ def getregentry(encoding):
while True:
try:
g = m.group(i) or ""
- if g.isdigit() and not g.startswith("0") and "".join(set(g)) != "01":
+ if g.isdigit() and not g.startswith("0") and (re.match(r"10+", g) or "".join(set(g)) != "01"):
g = int(g)
args += [g]
i += 1
@@ -370,7 +373,7 @@ def add_macro(mname, *encodings):
:param mname: macro name
:param encodings: encoding names of the encodings to be chained with the macro
"""
- global PERS_MACROS
+ global PERS_MACROS # noqa: F824
# check for name clash with alreday existing macros and codecs
if mname in MACROS or mname in PERS_MACROS:
raise ValueError("Macro name already exists")
@@ -630,7 +633,7 @@ def __get_value(token, position, case_changed=False):
def clear():
""" Clear codext's local registry of search functions. """
- global __codecs_registry, MACROS, PERS_MACROS
+ global __codecs_registry, MACROS, PERS_MACROS # noqa: F824
__codecs_registry, MACROS, PERS_MACROS = [], {}, {}
codecs.clear = clear
@@ -733,7 +736,7 @@ def list_macros():
def remove(name):
""" Remove all search functions matching the input encoding name from codext's local registry or any macro with the
given name. """
- global __codecs_registry, MACROS, PERS_MACROS
+ global __codecs_registry, MACROS, PERS_MACROS # noqa: F824
tbr = []
for search_function in __codecs_registry:
if search_function(name) is not None:
@@ -764,7 +767,7 @@ def remove(name):
def reset():
""" Reset codext's local registry of search functions and macros. """
- global __codecs_registry, CODECS_REGISTRY, MACROS, PERS_MACROS
+ global __codecs_registry, CODECS_REGISTRY, MACROS, PERS_MACROS # noqa: F824
clear()
d = os.path.dirname(__file__)
for pkg in sorted(os.listdir(d)):
@@ -870,10 +873,9 @@ def _handle_error(token, position, output="", eename=None):
:param output: output, as decoded up to the position of the error
"""
if errors == "strict":
- msg = "'%s' codec can't %scode %s '%s' in %s %d"
- token = ensure_str(token)
- token = token[:7] + "..." if len(token) > 10 else token
- err = getattr(builtins, exc)(msg % (eename or ename, ["en", "de"][decode], kind, token, item, position))
+ token = f"{token[:7]}..." if len(token := ensure_str(token)) > 10 else token
+ err = getattr(builtins, exc)(f"'{eename or ename}' codec can't {['en','de'][decode]}code {kind} '{token}' "
+ f"in {item} {position}")
err.output = output
err.__cause__ = err
raise err
@@ -1264,8 +1266,8 @@ def __guess(prev_input, input, stop_func, depth, max_depth, min_depth, encodings
if not stop and (show or debug) and found not in result:
s = repr(input)
s = s[2:-1] if s.startswith("b'") and s.endswith("'") else s
- s = "[+] {', '.join(found)}: {s}"
- print(s if len(s) <= 80 else s[:77] + "...")
+ s = f"[+] {', '.join(found)}: {s}"
+ print(s if len(s) <= 80 else f"{s[:77]}...")
result[found] = input
if depth >= max_depth or len(result) > 0 and stop:
return
@@ -1275,7 +1277,7 @@ def __guess(prev_input, input, stop_func, depth, max_depth, min_depth, encodings
if len(result) > 0 and stop:
return
if debug:
- print(f"[*] Depth %0{len(str(max_depth))}d/%d: {encoding}" % (depth+1, max_depth))
+ print(f"[*] Depth {depth+1:0{len(str(max_depth))}}/{max_depth}: {encoding}")
__guess(input, new_input, stop_func, depth+1, max_depth, min_depth, encodings, result, found + (encoding, ),
stop, show, scoring_heuristic, extended, debug)
diff --git a/src/codext/hashing/__init__.py b/src/codext/hashing/__init__.py
index 2aa13a0..b7e9fcc 100755
--- a/src/codext/hashing/__init__.py
+++ b/src/codext/hashing/__init__.py
@@ -1,8 +1,9 @@
-# -*- coding: UTF-8 -*-
-from .blake import *
-from .checksums import *
-from .crypt import *
-from .md import *
-from .sha import *
-from .shake import *
-
+# -*- coding: UTF-8 -*-
+from .blake import *
+from .checksums import *
+from .crypt import *
+from .md import *
+from .mmh3 import *
+from .sha import *
+from .shake import *
+
diff --git a/src/codext/hashing/blake.py b/src/codext/hashing/blake.py
index 6656c46..e168819 100644
--- a/src/codext/hashing/blake.py
+++ b/src/codext/hashing/blake.py
@@ -1,5 +1,5 @@
# -*- coding: UTF-8 -*-
-"""Case Codecs - string hashing with blake.
+"""Blake2 Codecs - string hashing with blake.
These are codecs for hashing strings, for use with other codecs in encoding chains.
diff --git a/src/codext/hashing/checksums.py b/src/codext/hashing/checksums.py
index f94dd2e..85dbe67 100644
--- a/src/codext/hashing/checksums.py
+++ b/src/codext/hashing/checksums.py
@@ -1,5 +1,5 @@
# -*- coding: UTF-8 -*-
-"""Case Codecs - string common checksums.
+"""Checksum Codecs - string common checksums.
These are codecs for hashing strings, for use with other codecs in encoding chains.
diff --git a/src/codext/hashing/crypt.py b/src/codext/hashing/crypt.py
index eddc668..2a9ed95 100644
--- a/src/codext/hashing/crypt.py
+++ b/src/codext/hashing/crypt.py
@@ -1,5 +1,5 @@
# -*- coding: UTF-8 -*-
-"""Case Codecs - string hashing with Unix's Crypt.
+"""Crypt Hashing Codec - string hashing with Unix's Crypt.
These are codecs for hashing strings, for use with other codecs in encoding chains.
@@ -15,7 +15,7 @@
try:
import crypt
except ImportError:
- import crypt_r as crypt
+ import legacycrypt as crypt
METHODS = [x[7:].lower() for x in crypt.__dict__ if x.startswith("METHOD_")]
diff --git a/src/codext/hashing/md.py b/src/codext/hashing/md.py
index 521a01c..0f8a053 100644
--- a/src/codext/hashing/md.py
+++ b/src/codext/hashing/md.py
@@ -1,5 +1,5 @@
# -*- coding: UTF-8 -*-
-"""Case Codecs - string hashing with Message Digest (MD).
+"""MD Hashing Codecs - string hashing with Message Digest (MD).
These are codecs for hashing strings, for use with other codecs in encoding chains.
@@ -56,4 +56,3 @@ def md2(data):
add("md5", lambda s, error="strict": (hashlib.new("md5", b(s)).hexdigest(), len(s)), guess=None)
if "md4" in hashlib.algorithms_available:
add("md4", lambda s, error="strict": (hashlib.new("md4", b(s)).hexdigest(), len(s)), guess=None)
-
diff --git a/src/codext/hashing/mmh3.py b/src/codext/hashing/mmh3.py
new file mode 100644
index 0000000..8c26639
--- /dev/null
+++ b/src/codext/hashing/mmh3.py
@@ -0,0 +1,18 @@
+# -*- coding: UTF-8 -*-
+"""MMH3 Codecs - string hashing with MurmurHash3.
+
+These are codecs for hashing strings, for use with other codecs in encoding chains.
+
+These codecs:
+- transform strings from str to str
+- transform strings from bytes to bytes
+- transform file content from str to bytes (write)
+"""
+from ..__common__ import *
+
+
+if "mmh3_32" in hashlib.algorithms_available:
+ add("mmh3_32", lambda s, error="strict": (hashlib.mmh3_32(b(s)).hexdigest(), len(s)), guess=None)
+if "mmh3_128" in hashlib.algorithms_available:
+ add("mmh3_128", lambda s, error="strict": (hashlib.mmh3_128(b(s)).hexdigest(), len(s)), guess=None)
+
diff --git a/src/codext/hashing/sha.py b/src/codext/hashing/sha.py
index 1351fe8..044e159 100644
--- a/src/codext/hashing/sha.py
+++ b/src/codext/hashing/sha.py
@@ -1,5 +1,5 @@
# -*- coding: UTF-8 -*-
-"""Case Codecs - string hashing with Secure Hash Algorithms.
+"""SHA Hashing Codecs - string hashing with Secure Hash Algorithms.
These are codecs for hashing strings, for use with other codecs in encoding chains.
diff --git a/src/codext/hashing/shake.py b/src/codext/hashing/shake.py
index 22c7b99..2b04424 100644
--- a/src/codext/hashing/shake.py
+++ b/src/codext/hashing/shake.py
@@ -1,5 +1,5 @@
# -*- coding: UTF-8 -*-
-"""Case Codecs - string hashing with SHAKE.
+"""Shake Hashing Codecs - string hashing with SHAKE.
These are codecs for hashing strings, for use with other codecs in encoding chains.
diff --git a/tests/test_manual.py b/tests/test_manual.py
index bed4884..c6e3c74 100644
--- a/tests/test_manual.py
+++ b/tests/test_manual.py
@@ -125,7 +125,10 @@ def test_codec_hash_functions(self):
self.assertIsNotNone(codecs.encode(STR, h))
self.assertRaises(NotImplementedError, codecs.decode, STR, h)
if UNIX:
- import crypt
+ try:
+ import crypt
+ except ImportError:
+ import legacycrypt as crypt
METHODS = [x[7:].lower() for x in crypt.__dict__ if x.startswith("METHOD_")]
for m in METHODS:
h = "crypt-" + m