Python:如何检查列表中的项目是否包含必须满足两个条件的elif语句中的字符串?

时间:2019-03-09 16:52:24

标签: python string pandas list append

我正在用python做一个刮板,该刮板执行搜索,然后打开搜索中的每个链接,并在强标签内列出所有内容。

然后将列表追加到数据集。并非所有页面都相同,因此我将根据有多少个强标签,在某些情况下,如果一个特定的标签包含一个或多个单词来组织它们。我需要同时满足这两个条件,才能使Strong标签的内容排在右边。

该代码有效,但体积很大,我正在尝试编写简洁的代码。

    for a in addr:
        driver.get(a)
        print(a)

        WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "_errorElement_")))
        html = driver.page_source
        soup = BeautifulSoup(html, "html.parser")

        columns = ['Business Name', 'Control Number', 'Business Type', 'Business Status', 'NAICS Code', 'NAICS Sub Code',
                   'Principal Office Address', 'Date of Formation/ Registration Date', 'State of Formation/ Jurisdiction',
                   'Last Registration Year', 'Dissolved Date', 'Registered Agent', 'Registered Agent Address', 'County']

        df = pd.DataFrame(columns=columns)
        strong = []
        for strong_tag in soup.find_all('strong'):
            strong.append(str(strong_tag.text))

        if len(strong) == 14:
            values = [strong[0], strong[1], strong[2], strong[3], strong[4], strong[5], strong[6], strong[7], strong[8],
                      strong[9], strong[10], strong[11], strong[12], strong[13]]
        elif len(strong) == 6:
            values = [strong[0], '', '', 'Name Reservation', '', '', strong[3], strong[1], '', '', '', strong[2], '', '']

        elif len(strong) == 13 and "Active" in str(strong[3]):
            values = [strong[0], strong[1], strong[2], strong[3], strong[4], strong[5], strong[6], strong[7], strong[8],
                      strong[9], '', strong[10], strong[11],strong[12]]
#the above code appears to be correct for 13 length active compliance Domestic LLC( and possibly active owes current year)

以下5条elif语句是我要结合的内容。我不确定如何检查列表中的项目是否包含5个单词中的任何一个,同时还检查列表的长度。

 elif len(strong) == 13 and "Admin" in str(strong[3]):
        values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
                  strong[9], strong[10], strong[11], strong[12]]
    elif len(strong) == 13 and "Abandoned" in str(strong[3]):
        values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
                 strong[9], strong[10], strong[11], strong[12]]
    elif len(strong) == 13 and "Withdrawn" in str(strong[3]):
        values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
                  strong[9], strong[10], strong[11], strong[12]]
    elif len(strong) == 13 and "Dissolved" in str(strong[3]):
        values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
                 strong[9], strong[10], strong[11], strong[12]]
    elif len(strong) == 13 and "Terminated" in str(strong[3]):
        values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
                  strong[9], strong[10], strong[11], strong[12]]

    elif len(strong) == 12:
        values = [strong[0], strong[1], strong[2], strong[3], strong[4], '', strong[5], strong[6], strong[7], strong[8],
                  '', strong[9], strong[10], strong[11]]
    else:
        values = [strong[0], '', '', '', '', '', '', '', '', '', '', '', '', '']
        print("WARNING! New values length...")
    df = df.append(pd.Series(values, index=columns), ignore_index=True)
    df2 = df2.append(df)
driver.close()
driver.switch_to.window(driver.window_handles[0])

2 个答案:

答案 0 :(得分:1)

只需使用in,您要检查strong[3]是否为in数组['Admin', 'Abandoned', ...]

l = ['Admin', 'Abandoned', 'Withdrawn', 'Dissolved', 'Terminated']
if len(strong) == 13 and strong[3] in l:
    values = strong[:5] + [''] + strong[5:]
elif len(strong) == 12:
    values = strong[:5] + [''] + strong[5:9] + [''] + strong[9:]
else:
    values = [strong[0]] + ['']*12

P.S。而且,在分配给values时,您还可以组合元素以使其更简洁

答案 1 :(得分:0)

内部检查是多余的,我建议您在外部添加一个长度条件,并在该条件完成后再在内部满足以下要求:

if len(strong) == 13:
   # All the flow comming here has a list of length 13
   if "Dissolved" in strong[3]:
      # Do whatever
      pass
   elif ...:
      ...
elif len(strong) == 12:
    ...

这样更容易理解。