无法在Python中获取属性值

时间:2019-03-08 18:04:00

标签: python findall getattr

我正在尝试为网站编写一个刮刀,到目前为止,我已经能够刮擦所需的常规信息,但是即使有明显的值,我试图从该信息中获取的特定属性值也没有返回那里。 一切正常,直到我尝试使用容器中每个容器的getattr来查找data-id的值。也许有更好的方法可以做到这一点,但我很难理解为什么找不到它。

这就是我的代码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.action_chains import ActionChains

url = "http://csgo.exchange/id/76561197999004010#x"

driver = webdriver.Firefox()

driver.get(url)
import time
time.sleep(10)
html = driver.page_source
soup = soup(html, "html.parser")


containers = soup.findAll("div",{"class":"vItem"})
print(len(containers))

for container in containers:
    test = getattr(container, "data-id")

    print(str(test))


with open('scraped.txt', 'w', encoding="utf-8") as file:
    file.write(str(containers))

下面是每个容器的外观示例。

  

div class =“ vItem Normal Container cItem” data-bestquality =“ 0” data-category =“ Normal” data-collection =“ The Spectrum Collection” data-currency =“ 0” data-custom =“” data- exterior =“” data-hashname =“ Spectrum%20Case” data-id =“ 15631554103”

1 个答案:

答案 0 :(得分:0)

只需将getattr()的行更改为container.attrs["data-id"]。这对我行得通。但是在大多数尝试中,睡眠十秒钟对我来说还不够。

from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.action_chains import ActionChains

url = "http://csgo.exchange/id/76561197999004010#x"

driver = webdriver.Firefox()

driver.get(url)
import time
time.sleep(10)
html = driver.page_source
soup = soup(html, "html.parser")


containers = soup.findAll("div",{"class":"vItem"})
print(len(containers))
data_ids = [] # Make a list to hold the data-id's

for container in containers:
    test = container.attrs["data-id"]
    data_ids.append(test) # add data-id to the list

    print(str(test))


with open('scraped.txt', 'w', encoding="utf-8") as file:
    for id in data_ids:
        file.write(str(id)+'\n') # write every data-id to a new line.