我正试图从这个网页https://www.mmoga.co.uk/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/中使用Python 2。我需要抓取的数据表示如下
<a href="/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15,70000-FIFA-15-Xbox-One-Ultimate-Team-Coins/" class="smallBoldText" style="text-decoration:none;" title="70.000 FIFA 15 Xbox One Ultimate Team Coins">
我需要检索标题的内容。我尝试了以下代码,但这似乎不起作用
i=0
while i< len(titles):
htmltext = urllib.urlopen("https://www.mmoga.com/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/")
data = json.load(htmltext)
mmogaamount.append(data["title"])
print mmogaamount
i+=1
答案 0 :(得分:1)
这将帮助您入门:
import requests
from bs4 import BeautifulSoup
# get html
content = requests.get("https://www.mmoga.co.uk/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/").content
# pass html to beautifulSoup
soup = BeautifulSoup(content)
# find tr tag we want based on the class
tr = soup.body.find("tr",attrs={"class":"row1"})
# extract the titles from the "smallBoldText" class
print([x["title"] for x in tr.find_all(attrs={"class":"smallBoldText"}) if x.has_attr("title")])
['70.000 FIFA 15 Xbox One Ultimate Team Coins']
我建议查看bs4 docs,有很多教程非常容易理解。