BeautifulSoup:如何提取标签值?

时间:2017-11-10 17:00:56

标签: python beautifulsoup web-crawler

我对编程很陌生,似乎无法解决以下数据提取问题。

这就是我的数据(黄色=我想要提取的内容):

View image

提取标题,价格和时间可以正常工作:

# Title
advertTitle = firstAdvert.find_all(
"section", {"class": "aditem-main"})[0].find("h2").text.encode("utf-8").strip().replace("\n", "")

# Price
advertPrice = firstAdvert.find_all(
"section", {"class": "aditem-details"})[0].find("strong").text.encode("utf-8").strip().replace("\n", "")

# Time
advertTimeAdded = advertTitle = firstAdvert.find_all(
"section", {"class": "aditem-addon"})[0].text.encode("utf-8").strip().replace("\n", "")

但我的主要问题是:我如何从中提取“ 79924470 ”:

<article class="aditem" data-adid="79924470">

我尝试过例如:

之类的东西
item.find_all("article", "data-adid"}

感谢您指出我正确的方向!

3 个答案:

答案 0 :(得分:1)

由于您正在使用BeautifulSoup,您可以这样做以提取属性的值:

soup = BeautifulSoup(file, "lxml")
print soup.article['data-adid'] # output : 79924470

答案 1 :(得分:0)

您可以这样做:

import UIKit

class ProfileViewController: UIViewController {

@IBOutlet weak var moneyOutlet: UILabel!
@IBOutlet weak var scoreOutlet: UILabel!
@IBOutlet weak var profilePicOutlet: UIImageView!
@IBOutlet weak var usernameOutlet: UILabel!
override func viewDidLoad() {
    super.viewDidLoad()

}

override func didReceiveMemoryWarning() {
    super.didReceiveMemoryWarning()
}

@IBAction func trophiesPressed(_ sender: Any) {
   // not triggering as expected
    print("hello world")
}

这应该将data = [] for element in soup.find_all({'data-adid':'79924470'}): data.append(element['data-adid'] 的每个值添加到列表data-adid

答案 2 :(得分:0)

可以使用一系列选择来获取各种元素,如下所示:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")
print soup.article['data-adid']
image = soup.select('div.imagebox.srpimagebox')[0]
print image['data-href']
print image['data-imgsrc']
print soup.select('section h2 a')[0].text
print ', '.join([v.strip() for v in soup.select('section.aditem-details')[0].text.strip().split('\n')])
print soup.select('section.aditem-addon')[0].get_text(strip=True)

显示:

79924470
/ref/79924470
https://imgserver.com/012004.JPG
I am a title
12.380€, 50111, Cityname, 25km
Today, 16:19