从列表中删除不在“演讲”中的项目?

时间:2015-09-17 20:53:41

标签: python list python-2.7 trim

url = 'http://www.millercenter.org/president/speeches'

conn = urllib2.urlopen(url)
html = conn.read()

miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')

linklist = [tag.get('href') for tag in links if tag.get('href') is not None]
linklist = str(linklist)

end_of_links = [line for line in linklist if '/events/' in line]
print end_of_links

这是我输出的一小部分(保存在Python列表中)。

['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america', 
'/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
'#top', '/president/obama/speeches/speech-4427', president/obama/speeches/speech-4430', ...]

我想删除列表中不包含speeches的所有项目。我已经尝试了filter()并且只创建了另一个列表理解,但这还没有奏效。我不知道为什么end_of_links变量不起作用 - 至少对我来说似乎很直观。

2 个答案:

答案 0 :(得分:1)

li = ['/ events / 2015 / one-nation-under-god-how-corporate-america-invented-christian-america',       '/ events / 2015 / a-conversation-with-bernie-sanders','#reagan','#gwbush','#obama', '#top','/ president / obama / speeches / speech-4427','president / obama / speeche / speech-4430']

导入重新

li = [x for li in li if re.search('speeches',x)]

打印(LI)

['/ president / obama / speeches / speech-4427','president / obama / speeches / speech-4430']

答案 1 :(得分:0)

Kust保留包括'演讲'的那些:

link_list = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
 '/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
 '#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
speech_list = [_ for _ in link_list if 'speeches' in _]

这是我在Python2.7中的终端会话

>>> link_list = ['/events/2015/one-nation-under-god-how-corporate-america-invented-christian-america',
...  '/events/2015/a-conversation-with-bernie-sanders', '#reagan', '#gwbush', '#obama',
...  '#top', '/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
>>> speech_list = [_ for _ in link_list if 'speeches' in _]
>>> speech_list
['/president/obama/speeches/speech-4427', 'president/obama/speeches/speech-4430']
>>>