Skip to content

Commit d818133

Browse files
committed
Modify UnicodeDecodeError text. You'll use utf-8
1 parent 9cebdb7 commit d818133

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

‎ch05/classify.py‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def prepare_sent_features():
5454
ifnottext:
5555
meta[pid]['AvgSentLen'] =meta[pid]['AvgWordLen'] =0
5656
else:
57+
text=text.decode('utf-8')
5758
sent_lens= [len(nltk.word_tokenize(
5859
sent)) forsentinnltk.sent_tokenize(text)]
5960
meta[pid]['AvgSentLen'] =np.mean(sent_lens)

0 commit comments

Comments
(0)