Question

我在这个网站上做了一些研究，找到了一种方法来解决我的问题，但要么线程太旧了（几年前雅虎刷新了它的页面），或者它们太复杂了（我＆＃39 ; m仍然新来抓取）。我想在此代码创建的csv文件中搜索关键字。

我使用了这段代码，但雅虎的头条新闻有点棘手，让我解释一下。

# import libraries
import urllib2  
from bs4 import BeautifulSoup  
import csv  
from datetime import datetime

quote_page = 'https://finance.yahoo.com/' 
page = urllib2.urlopen(quote_page)  
soup = BeautifulSoup(page, 'html.parser') 
name_box = soup.find('h1', attrs={'class': 'name'})
name = name_box.text.strip() 
print name


with open('index.csv', 'a') as csv_file:  
writer = csv.writer(csv_file)
writer.writerow([name, ])

正如您在此图片中看到的那样，标题介于以下两者之间：！ - react-text：3388 - ＆gt; ！ - / react-text - ＆gt; 但我不知道如何转换我的代码以便能够阅读这些内容。

解决方案可能非常简单，但我尝试了很多东西，但似乎没有任何效果。

我希望您能够帮助我或找到另一种在这些标题中找到关键字的方法。

非常感谢您提前。

Answer 1

我使用requests代替urllib2。据我所知，这是更多人使用的。

至于标题：

import requests
from bs4 import BeautifulSoup
a = requests.get('https://finance.yahoo.com/m/8bb0b8f6-9b97-32df-8f56-31690cd85cea/long-lines-are-killing.html')
soup = BeautifulSoup(a.content, 'lxml')
search = soup.find_all('h1', {'class':'Lh(36px) Fz(25px)--sm Fz(32px) Mb(17px)--sm Mb(20px) Mb(30px)--lg Ff($ff-primary) Lts($lspacing-md) Fw($fweight) Fsm($fsmoothing) Fsmw($fsmoothing) Fsmm($fsmoothing) Wow(bw)'})
print(search[0].text) # prints Long Lines Are Killing Starbucks, So Here's Its Bold New Solution to the Major Problem

来自雅虎财经头条的废钢数据

1 个答案: