从Python中的列表中删除BeautifulSoup标记

时间:2017-04-13 14:15:30

标签: python beautifulsoup html-parsing

我有以下代码,它通过列表并提取信息以放入新列表。

如果找到0,则追加0。如果找到“无”,则附加0。 第三种list元素是BeautifulSoup中提取的标签。

我希望能够做的是从内部标记中提取一些信息并将其附加到newList,但是,假设我正在使用{{1标签中的信息正在阻碍。

我的代码在这里给出:

regex

问题在于标签本身有数字,这就是抛弃聚合的价值。

通常我只是将代码更改为list = ['<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=826">11 votes for, 1 vote against, 15 absences, between 1999&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=811">8 votes for, 1 vote against, 3 absences, between 1999&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1050">4 votes for, 0 votes against, 3 absences, between 2002&ndash;2004</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6686">4 votes for, 1 vote against, 2 absences, between 2004&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6703">5 votes for, 0 votes against, 4 absences, between 2011&ndash;2016</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6688">3 votes for, 7 votes against, 1 absence, between 2002&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1049">0 votes for, 6 votes against, between 2002&ndash;2003</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=975">1 vote for, 1 vote against, 13 absences, between 2006&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=984">0 votes for, 4 votes against, 3 absences, between 2007&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1065">45 votes for, 12 votes against, 32 absences, between 2007&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1027">2 votes for, 3 votes against, 8 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6706">3 votes for, 1 vote against, between 2010&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6764">5 votes for, 3 votes against, 4 absences, between 2016&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6761">4 votes for, 4 votes against, 5 absences, between 2016&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6757">0 votes for, 3 votes against, between 2014&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6672">0 votes for, 13 votes against, 4 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6674">5 votes for, 0 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6673">13 votes for, 0 votes against, 2 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6684">0 votes for, 3 votes against, 1 absence, in 2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6674">5 votes for, 0 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6702">8 votes for, 0 votes against, 1 absence, between 2011&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6680">0 votes for, 21 votes against, 4 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1110">3 votes for, 18 votes against, 5 absences, between 2010&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6694">5 votes for, 10 votes against, 4 absences, between 2010&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6681">10 votes for, 0 votes against, 2 absences, between 2012&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1109">1 vote for, 3 votes against, 1 absence, between 2004&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1109">1 vote for, 3 votes against, 1 absence, between 2004&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6685">17 votes for, 1 vote against, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6733">2 votes for, 6 votes against, 2 absences, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6711">2 votes for, 0 votes against, 2 absences, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6716">0 votes for, 5 votes against, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6731">0 votes for, 12 votes against, between 2008&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6756">0 votes for, 4 votes against, 1 absence, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6679">1 vote for, 21 votes against, 4 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6690">5 votes for, 3 votes against, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6691">7 votes for, 7 votes against, between 2010&ndash;2014</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6677">7 votes for, 0 votes against, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6676">0 votes for, 7 votes against, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=363">0 votes for, 4 votes against, 1 absence, in 2003</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=811">8 votes for, 1 vote against, 3 absences, between 1999&ndash;2015</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1074">2 votes for, 14 votes against, 16 absences, between 1998&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1132">0 votes for, 1 vote against, in 2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6687">0 votes for, 9 votes against, 2 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6682">0 votes for, 2 votes against, in 2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1052">4 votes for, 6 votes against, 5 absences, between 1997&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6671">0 votes for, 4 votes against, 2 absences, between 2010&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1113">0 votes for, 11 votes against, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1136">0 votes for, 6 votes against, 2 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=996">2 votes for, 0 votes against, 8 absences, between 2007&ndash;2009</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1084">1 vote for, 1 vote against, 4 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=837">10 votes for, 0 votes against, 4 absences, between 2003&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6683">0 votes for, 4 votes against, 1 absence, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6678">0 votes for, 12 votes against, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6698">2 votes for, 2 votes against, 1 absence, between 2010&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1079">5 votes for, 1 vote against, 5 absences, between 1999&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6708">2 votes for, 1 vote against, 16 absences, between 2012&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6709">8 votes for, 5 votes against, 20 absences, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6695">23 votes for, 12 votes against, 14 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6736">0 votes for, 3 votes against, in 2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=842">3 votes for, 1 vote against, 3 absences, between 2004&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1087">3 votes for, 13 votes against, 12 absences, between 2002&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1071">2 votes for, 1 vote against, 2 absences, between 2008&ndash;2009</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1051">6 votes for, 6 votes against, 12 absences, between 2005&ndash;2006</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6696">0 votes for, 7 votes against, 1 absence, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6721">0 votes for, 5 votes against, 3 absences, between 2014&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6734">0 votes for, 7 votes against, 2 absences, between 2015&ndash;2016</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6758">0 votes for, 2 votes against, 1 absence, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1030">19 votes for, 6 votes against, 6 absences, between 2000&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6697">0 votes for, 2 votes against, in 2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6704">4 votes for, 1 vote against, between 2011&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6710">0 votes for, 3 votes against, 1 absence, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6741">2 votes for, 1 vote against, 1 absence, in 2015</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6747">2 votes for, 0 votes against, 1 absence, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6692">4 votes for, 0 votes against, 1 absence, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6746">2 votes for, 0 votes against, 2 absences, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6744">0 votes for, 5 votes against, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6743">0 votes for, 5 votes against, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=810">7 votes for, 5 votes against, 3 absences, between 2004&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1120">0 votes for, 3 votes against, 2 absences, in 2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1053">13 votes for, 30 votes against, 27 absences, between 2001&ndash;2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1105">0 votes for, 3 votes against, 2 absences, between 2009&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6705">2 votes for, 0 votes against, 2 absences, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6707">1 vote for, 7 votes against, 4 absences, between 2011&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6715">0 votes for, 5 votes against, 2 absences, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6720">2 votes for, 3 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6719">0 votes for, 4 votes against, 2 absences, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6718">4 votes for, 0 votes against, in 2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6667">9 votes for, 57 votes against, 15 absences, between 2011&ndash;2015</a>'] newList = [] digitReg = r"\d+" for thing in list: aggregate = 0 if thing == '0': newList.append(0) elif thing == 'None': newList.append(0) else: matches = re.findall(digitReg,thing) forNum = int(matches[0]) againstNum = int(matches[1]) aggregate = forNum - againstNum newList.append(aggregate) print newList print len(newList) int(matches[2]);但这是不可靠的,因为我将在不同的列表上运行此代码,标签本身的匹配数量将会改变。

有没有办法在找到匹配项之前从列表中删除标记?

1 个答案:

答案 0 :(得分:2)

要使用Beautiful Soup提取每个标签内的文本,您可以这样做:

aggregate = 0
for thing in list:
    if thing == '0':
        newList.append(0)
    elif thing == 'None':
        newList.append(0)
    else:
        matches = re.findall(digitReg, BeautifulSoup(thing,'html.parser').text)
        againstNum = int(matches[1])
        aggregate = forNum - againstNum
        newList.append(aggregate)