需要帮助在 python 中抓取这些数据

时间:2021-02-21 12:34:03

标签: python selenium web-scraping

我有一个代码如下所示的网站:

<div class="d-row js-wrapper" id="row-1"><div class="d-cell js-activator" data-label="type">Residential</div><div class="d-cell d-cell--break" data-label="Company">J Smith</div><div class="d-cell js-target" data-label="Location">UK</div><div class="d-cell js-target" data-label="ID">62144</div><div class="d-cell js-target" data-label="Ask
">730000</div><div class="d-cell js-target" data-label="email">None</div><div class="d-cell js-target" data-label="Contact time (GMT)">
                                8:00 am to 4:30 pm
                        </div> </div>

<div class="d-row js-wrapper" id="row-2"><div class="d-cell js-activator" data-label="type">Commercial</div><div class="d-cell d-cell--break" data-label="Company">JBloggs ltd</div><div class="d-cell js-target" data-label="Location">FR</div><div class="d-cell js-target" data-label="ID">55324</div><div class="d-cell js-target" data-label="Ask
">670000</div><div class="d-cell js-target" data-label="email">None</div><div class="d-cell js-target" data-label="Contact time (GMT)">
                                9:00 am to 5:30 pm
                        </div> </div>

我希望能够将它刮到熊猫数据框中。到目前为止,我已经在 selenium 中尝试了以下内容:

info = driver.find_element_by_class_name(".d-row")
print(info[0].text)

但这给了这个:

Residential J Smith UK 62144 730000 None 8:00 am to 4:30 pm

有人可以帮忙吗?

谢谢!

2 个答案:

答案 0 :(得分:1)

如何找到所有class包含d-cell的元素,然后获取属性data-label

list_elements = driver.find_elements_by_xpath('//div[contains(@class, "d-cell")]')
for element in list_elements:
   print(element.get_attribute("data-label"))

答案 1 :(得分:1)

它缺少 s 它应该是 find_element[s]_by_class_name.-d-row 不是在该上下文中使用的有效值它应该用于 css 选择器并使用 get_attribute() 来获取元素属性

for row in driver.find_elements_by_css_selector(".d-row"):
    for cell in row.find_elements_by_css_selector('.d-cell'):
        key = cell.get_attribute('data-label').strip()
        value = cell.text.strip()
        print("{}: {}".format(key, value))