从网站上刮取javascript

时间:2015-02-17 23:16:03

标签: python web-scraping

我正试图从这个网页https://www.mmoga.co.uk/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/中使用Python 2。我需要抓取的数据表示如下

 <a href="/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15,70000-FIFA-15-Xbox-One-Ultimate-Team-Coins/" class="smallBoldText" style="text-decoration:none;" title="70.000 FIFA 15 Xbox One Ultimate Team Coins">

我需要检索标题的内容。我尝试了以下代码,但这似乎不起作用

   i=0
while i< len(titles):
    htmltext = urllib.urlopen("https://www.mmoga.com/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/")
    data = json.load(htmltext)
    mmogaamount.append(data["title"])
    print mmogaamount
    i+=1

1 个答案:

答案 0 :(得分:1)

这将帮助您入门:

import requests
from bs4 import BeautifulSoup


# get html
content = requests.get("https://www.mmoga.co.uk/FIFA-Coins/FUT-Coins-Xbox-One,FIFA-15/").content
# pass html to beautifulSoup
soup = BeautifulSoup(content)
# find tr tag we want based on the class
tr = soup.body.find("tr",attrs={"class":"row1"})
# extract the titles from the "smallBoldText" class
print([x["title"] for x in tr.find_all(attrs={"class":"smallBoldText"}) if x.has_attr("title")])
['70.000 FIFA 15 Xbox One Ultimate Team Coins']

我建议查看bs4 docs,有很多教程非常容易理解。