Document - Correct when instantiated with bytes content instead of bytes#195

tomdpsrd · 2025-07-28T13:00:51Z

Actual Behavior

Document.summary() is not working with python3 when the document is based on bytes and not on string content.
The new released version (0.8.4.1) contains an old modification that put the regexp in string instead of bytes.

Linked issue :
#194

Steps to Reproduce the Problem

Follow the readme steps

>>>importrequests>>>fromreadabilityimportDocument>>>response=requests.get('http://example.com') >>>doc=Document(response.content) >>>doc.title() Traceback (mostrecentcalllast): ... RE_CHARSET.findall(page) +RE_PRAGMA.findall(page) +RE_XML.findall(page) ^^^^^^^^^^^^^^^^^^^^^^^^TypeError: cannotuseastringpatternonabytes-likeobject

Correction bytes
da65b75

tomdpsrd changed the title ~~Correction bytes~~Document - Correct when instantiated with bytes content instead of bytesJul 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document - Correct when instantiated with bytes content instead of bytes#195

Document - Correct when instantiated with bytes content instead of bytes #195

Uh oh!

tomdpsrd commented Jul 28, 2025•
edited
Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Document - Correct when instantiated with bytes content instead of bytes#195

Are you sure you want to change the base?

Document - Correct when instantiated with bytes content instead of bytes #195

Uh oh!

Conversation

tomdpsrd commented Jul 28, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Actual Behavior

Steps to Reproduce the Problem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomdpsrd commented Jul 28, 2025•
edited
Loading