Python:刮掉纽约时报头版最常见的单词

时间:2017-06-15 21:51:33

标签: python web-scraping words

我似乎无法运行我的代码。我没有看到任何人有相同的错误代码。比我更聪明的人,任何想法?

import requests
from bs4 import BeautifulSoup
from collections import Counter
from string import punctuation

r = requests.get("https://www.nytimes.com/?mcubz=0")

soup = BeautifulSoup(r.content, "lxml")

text = (''.join(s.findAll(text=True))for s in soup.findAll('p'))

c = Counter((x.rstrip(punctuation).lower() for y in text for x in y.split()))

print (c.most_common())

这是我的错误消息

Traceback (most recent call last):
  File "ScrapeNews.py", line 19, in <module>
    print (c)
  File "C:\Users\Antra\Desktop\Python\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in position 218: character maps to <undefined>

我很沮丧。

0 个答案:

没有答案