Question

我正在抓取hrefs的网站我使用.lower（）并且很好但是现在当我添加新网站时我注意到一些在hrefs中有大写和小写。

当用户输入找到两者的链接匹配时，我该怎么做呢。

例如，搜索“游侠”将显示所有游骑兵，游侠，rAnGeR等。

user_input = raw_input（“搜索团队=”）

headers = {'User-Agent'：'Mozilla / 5.0'} req =   urllib2.Request（“http://wizhdsports.com/sports/Football.html”，无，   header）resp = urllib2.urlopen（req）
     汤= BeautifulSoup（resp，   from_encoding = resp.info（）。getparam（ '字符集'））

links = soup.find_all（'a'，href = re.compile（user_input））if len（links）   == 0：       打印“Wizhdsports.com没有可用的流”否则：       链接中的链接：           print（link ['href']）

Answer 1

由于您使用正则表达式来匹配用户输入，因此您可以使用re.IGNORECASE标记re.compile来执行不区分大小写的匹配。

您的原始代码示例已更新：

import urllib2
from bs4 import BeautifulSoup
import re

user_input = raw_input ("Search for Team = ")

headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request("http://wizhdsports.com/sports/Football.html", None, headers)
resp = urllib2.urlopen(req)

# fix UserWarning that parser not explicitly specified with bs4
soup = BeautifulSoup(resp, "html.parser", from_encoding=resp.info().getparam('charset'))

links = soup.find_all('a', href=re.compile(user_input, flags=re.IGNORECASE))
if len(links) == 0:
    print "Wizhdsports.com Have No Streams Available"
else:
    for link in links:
        print (link['href'])

使我的rawinput不区分大小写

1 个答案: