有没有更好的方法来格式化这些elements_by_id?

时间:2018-06-01 14:36:43

标签: python selenium xpath web-scraping

我有以下代码我正在重复,并想知道是否有人建议如何更有效地写这个:

def get_description(links):
    for link in links:
        description = driver.find_elements_by_id('some-id')
        description = [x.text for x in description]
        description = " ".join(description)
        title = driver.find_elements_by_id('different-id')
        title = [x.text for x in title]
        title = " ".join(title)
        company = driver.find_elements_by_id('another-different-id')
        company = [x.text for x in company]
        company = " ".join(company)
        location = driver.find_elements_by_id('location-id')
        location = [x.text for x in location]
        location = " ".join(location)+ " United Kingdom"
        salary = driver.find_elements_by_xpath("//*[@id='randomly generated id']/div[3]/span[1]")
        salary = [x.text for x in salary]
        salary = " ".join(salary)

我尝试定义一个名为'element_parse'的单独函数,如下所示:

def parse_element(x)
    x = [y.text for y in x]
    x = " ".join(x)

然后通过执行以下方式将其称为主要功能:

description = driver.find_elements_by_id('some-id')
parse_element(description)

但是唉!没有快乐。

因为我已经开始工作但不是一个表演停止但是我觉得这里有很多重复,我想要清理!

2 个答案:

答案 0 :(得分:1)

你快到了。您需要从函数返回x的值,并将值重新分配给该结果。所以:

def parse_element(x)
    x = [y.text for y in x]
    x = " ".join(x)
    return x

...

description = driver.find_elements_by_id('some-id')
description = parse_element(description)

答案 1 :(得分:0)

您可以通过尝试下面的内容来摆脱重复。而且,您不需要创建另一个函数来清理它。

def get_description(links):
    for link in links:
        description = ' '.join([x.text for x in driver.find_elements_by_id('some-id')])
        title = ' '.join([x.text for x in driver.find_elements_by_id('different-id')])
        company = ' '.join([x.text for x in driver.find_elements_by_id('another-different-id')])
        location = ' '.join([x.text for x in driver.find_elements_by_id('location-id')])
        salary = ' '.join([x.text for x in driver.find_elements_by_xpath("//*[@id='randomly generated id']/div[3]/span[1]")])