- Notifications
You must be signed in to change notification settings - Fork 302
Open
Description
Would it be possible to add position information, i.e. line+column to text nodes? Or, at least make this information available to the tree builder? I implemented a very minimal proof of concept to add the information to each token and pass that along to the dom tree builder and obtain the following result:
import html5lib html = '<div>&<p>b<span>c</span></p> cab</div>' parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom")) doc = parser.parse(html) def parse(n): for c in n.childNodes: if hasattr(c, 'sourcepos'): print(c.sourcepos, c) parse(c) parse(doc) None <DOM Element: head at 0x10bbed0d0> None <DOM Element: body at 0x10bbed1f0> (1, 5) <DOM Element: div at 0x10bbfb790> (1, 10) <DOM Text node "'&'"> (1, 13) <DOM Element: p at 0x10bbfb820> (1, 14) <DOM Text node "'b'"> (1, 20) <DOM Element: span at 0x10bbfb8b0> (1, 21) <DOM Text node "'c'"> (1, 33) <DOM Text node "' '"> (1, 36) <DOM Text node "'cab'"> I would be willing to implement it.
Metadata
Metadata
Assignees
Labels
No labels