我正在尝试为网站编写一个刮刀,到目前为止,我已经能够刮擦所需的常规信息,但是即使有明显的值,我试图从该信息中获取的特定属性值也没有返回那里。 一切正常,直到我尝试使用容器中每个容器的getattr来查找data-id的值。也许有更好的方法可以做到这一点,但我很难理解为什么找不到它。
这就是我的代码。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.action_chains import ActionChains
url = "http://csgo.exchange/id/76561197999004010#x"
driver = webdriver.Firefox()
driver.get(url)
import time
time.sleep(10)
html = driver.page_source
soup = soup(html, "html.parser")
containers = soup.findAll("div",{"class":"vItem"})
print(len(containers))
for container in containers:
test = getattr(container, "data-id")
print(str(test))
with open('scraped.txt', 'w', encoding="utf-8") as file:
file.write(str(containers))
下面是每个容器的外观示例。
div class =“ vItem Normal Container cItem” data-bestquality =“ 0” data-category =“ Normal” data-collection =“ The Spectrum Collection” data-currency =“ 0” data-custom =“” data- exterior =“” data-hashname =“ Spectrum%20Case” data-id =“ 15631554103”
答案 0 :(得分:0)
只需将getattr()
的行更改为container.attrs["data-id"]
。这对我行得通。但是在大多数尝试中,睡眠十秒钟对我来说还不够。
from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.action_chains import ActionChains
url = "http://csgo.exchange/id/76561197999004010#x"
driver = webdriver.Firefox()
driver.get(url)
import time
time.sleep(10)
html = driver.page_source
soup = soup(html, "html.parser")
containers = soup.findAll("div",{"class":"vItem"})
print(len(containers))
data_ids = [] # Make a list to hold the data-id's
for container in containers:
test = container.attrs["data-id"]
data_ids.append(test) # add data-id to the list
print(str(test))
with open('scraped.txt', 'w', encoding="utf-8") as file:
for id in data_ids:
file.write(str(id)+'\n') # write every data-id to a new line.