Skip to content

python-openxml/cxml

Repository files navigation

cxml - Compact XML translator

cxml translates a Compact XML (CXML) expression into the corresponding pretty-printed XML snippet. For example:

fromcxmlimportxmlxml('w:p/(w:pPr/w:jc{w:val=right},w:r/w:t"Right-aligned")'),

becomes:

<w:pxmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:pPr><w:jcw:val="right"/></w:pPr><w:r><w:t>Right-aligned</w:t></w:r></w:p>

Who cares?

The motivation for a compact XML expression language arose out of the testing requirements of the python-docx and python-pptx libraries. The WordprocessingML and PresentationML file formats are XML-based and many operations in those libraries involve the recognition or modification of XML. The tests then require a great many XML snippets to test all the possible combinations the code must recognize or produce.

Including full-sized XML snippets in the test code is both distracting and tedious. By compressing the specification of a snippet to fit on a single line (in most cases), the test code is much more compact and expressive.

Syntax

CXML syntax borrows from that of XPath.

An element is specified by its name:

>>>xml('foobar') <foobar/>

A child is specified by name following a slash:

>>>xml('foo/bar') <foo><bar/></foo>

XML output is pretty-printed with 2-space indentation.

Multiple child elements are specified by separating them with a comma and enclosing them in parentheses:

>>>xml('foo/(bar,baz)') <foo><bar/><baz/></foo>

Element attributes are specified in braces after the element name:

>>>xml('foo{a=b}') <fooa="b"/>

Multiple attributes are separated by commas:

>>>xml('foo{a=b,b=c}') <fooa="b"b="c"/>

Whitespace is permitted (and ignored) between tokens in most places, however after using CXML quite a bit I don't find it useful:

>>>xml(' foo{a=b, b=c}') <fooa="b"b="c"/>

Attribute text may be surrounded by double-quotes, which is handy when the text contains a comma or a closing brace:

>>>xml('foo{a=b,b="c,}g")}') <fooa="b"b="c,}g"/>

Text immediately following the attributes' closing brace is interpreted as the text of the element. Whitespace within the text is preserved.:

>>>xml('foo{a=b,b=c} bar ') <fooa="b"b="c">bar</foo>

Element text may also be enclosed in quotes, which allows it to contain a comma or slash that would otherwise be interpreted as the next token.:

>>>xml('foo{a=b}"bar/baz, barfoo"') <fooa="b">bar/baz, barfoo</foo>

An element having a namespace prefix appears with the corresponding namespace declaration:

>>>xml('a:foo)') <a:fooxmlns:a="http://foo/a"/>

A different namespace prefix in a descendant element causes the corresponding namespace declaration to be added to the root element, in the order encountered:

>>>xml('a:foo/(b:bar,c:baz)') <a:fooxmlns:a="http://foo/a"xmlns:b="http://foo/b"xmlns:c="http://foo/c"><b:bar/><c:baz/></a:foo>

A namespace can be explicitly declared as an attribute of an element, in which case it will appear whether a child element in that namespace is present or not:

>>>xml('a:foo{b:}') <a:fooxmlns:a="http://foo/a"xmlns:b="http://foo/b"/>

An explicit namespace appears immediately after the root element namespace (if it has one) when placed on the root element. This allows namespace declarations to appear in a different order than the order encountered. This is occasionally handy when matching XML by its string value.

An explicit namespace may also be placed on a child element, in which case the corresponding namespace declaration appears on that child rather than the root element:

>>>xml('a:foo/b:bar{b:,c:}') <a:fooxmlns:a="http://foo/a"><b:barxmlns:b="http://foo/b"xmlns:c="http://foo/c"/></a:foo>

Putting all these together, a reasonably complex XML snippet can be condensed quite a bit:

>>>xml('w:p/(w:pPr/w:jc{w:val=right},w:r/w:t"Right-aligned")'), <w:pxmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:pPr><w:jcw:val="right"/></w:pPr><w:r><w:t>Right-aligned</w:t></w:r></w:p>

About

Compact XML expression translator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published