Question

尝试在打印输出中抓取特定名称。截至目前，代码就是这样......

links = page_soup.findAll('div', attrs={'class' : 'gameLinks'})
for div in links:
    print div.find('a')['href']

上面的代码抓住了所有的链接，给了我大约50个链接，我只想要一些在网址中包含“redzone”一词的链接。以下是其中包含redzone的链接示例。

http://example.com/247075/1/nfl-redzone-live---never-miss-a-touchdown-live-stream-online.html
http://example.com/247075/2/nfl-redzone-live---never-miss-a-touchdown-live-stream-online.html

我在这里缺少什么？

Answer 1

尝试字符串中的if“substring”，以查看字符串是否包含子字符串：

links = page_soup.findAll('div', attrs={'class' : 'gameLinks'})
for div in links:
    link = div.find('a')['href']
    if "redzone" in link:
        print link

你也可以使用正则表达式，因为这是区分大小写的，它更复杂但功能更强大！ https://docs.python.org/3/howto/regex.html

从循环/打印中抓取特定单词

1 个答案: