Skip to content

Conversation

@tomdpsrd
Copy link

@tomdpsrdtomdpsrd commented Jul 28, 2025

Actual Behavior

Document.summary() is not working with python3 when the document is based on bytes and not on string content.
The new released version (0.8.4.1) contains an old modification that put the regexp in string instead of bytes.

Linked issue :
#194

Steps to Reproduce the Problem

Follow the readme steps

>>>importrequests>>>fromreadabilityimportDocument>>>response=requests.get('http://example.com') >>>doc=Document(response.content) >>>doc.title() Traceback (mostrecentcalllast): ... RE_CHARSET.findall(page) +RE_PRAGMA.findall(page) +RE_XML.findall(page) ^^^^^^^^^^^^^^^^^^^^^^^^TypeError: cannotuseastringpatternonabytes-likeobject

@tomdpsrdtomdpsrd changed the title Correction bytesDocument - Correct when instantiated with bytes content instead of bytesJul 28, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

@tomdpsrd