我是Python的新手,我正在尝试创建一个只打印文章的Web-Crawler(例如这个网站 - http://techcrunch.com/2014/09/15/microsoft-has-acquired-minecraft/),而不是网站上的其他内容。我试过这个(这不起作用):
source_code = requests.get('http://techcrunch.com/2014/09/15/microsoft-has-acquired-minecraft/')
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('div', {'class': 'article-entry text'}):
title = link.string
print(title)
它的印刷品:'无' THX
答案 0 :(得分:3)
您只需要文章而不是for
循环:
for link in soup.findAll('div', {'class': 'article-entry text'}):
title = link.string
print(title)
成功:
title = soup.find('h1', {'class': 'alpha tweet-title'}).get_text()
article = soup.find('div', {'class': 'article-entry text'}.get_text()
print title
print article
您将只获得标题和文章。有关BeautifulSoup的文档可能有所帮助。