Question

我一直在开发一个python web-crawler来从这个网站收集二手车库存数据。（http://www.bobaedream.co.kr/cyber/CyberCar.php?gubun=I&page=20）

首先，我想从名单中仅收集“宝马”。所以，我在正则表达式中使用了“搜索”功能，如下面的代码所示。但是，它不断返回“无”。

我的代码有什么问题吗？

请给我一些建议。

感谢。

from bs4 import BeautifulSoup
import urllib.request
import re

CAR_PAGE_TEMPLATE = "http://www.bobaedream.co.kr/cyber/CyberCar.php?gubun=I&page="

def fetch_post_list():

    for i in range(20,21):
        URL = CAR_PAGE_TEMPLATE + str(i)
        res = urllib.request.urlopen(URL)
        html = res.read()
        soup = BeautifulSoup(html, 'html.parser')
        table = soup.find('table', class_='cyber')
        print ("Page#", i)

        # 50 lists per each page
        lists=table.find_all('tr', itemtype="http://schema.org/Article")

        count=0
        r=re.compile("[BMW]")
        for lst in lists:
            if lst.find_all('td')[3].find('em').text:
                lst_price=lst.find_all('td')[3].find('em').text
                lst_title=lst.find_all('td')[1].find('a').text
                lst_link = lst.find_all('td')[1].find('a')['href']
                lst_photo_url=''
                if lst.find_all('td')[0].find('img'):
                    lst_photo_url = lst.find_all('td')[0].find('img')['src']
                count+=1
            else: continue

            print('#',count, lst_title, r.search("lst_title"))

    return lst_link

fetch_post_list()

Answer 1

r.search("lst_title")

这是在字符串文字"lst_title"内搜索，而不是名为lst_title的变量，这就是它永远不匹配的原因。

r=re.compile("[BMW]")

方括号表示您正在寻找其中一个字符。因此，例如，包含M的任何字符串都将匹配。你只想要"BMW"。事实上，你甚至不需要正则表达式，你只需要测试：

"BMW" in lst_title

如何在正则表达式中使用“搜索”功能对其进行分组？

1 个答案: