Question

我有这个代码，我可以在Python 2.7的网站上搜索所有名为“ctable”的表。但我想阻止它，当它达到这个价值XXXX的“ctable”时。我需要直到这个值XXXX。因此，如果它找到这个文本我想停止webscraping这些表。

有可能吗？

这是我的代码：

soup = BeautifulSoup(x, 'lxml')

datatable=[]
for ctable in soup.find_all('table',  "ctable" )[:-1]:
    for record in ctable.find_all('tr'):
        temp_data = []
        for data in record.find_all('td'):
            temp_data.append(data.text.encode('latin-1'))
        datatable.append(temp_data)

我试过这个：

datatable=[]
for ctable in soup.find_all('table',  "ctable" )[:-1]:
    for record in ctable.find_all('tr'):
        temp_data = []
        for data in record.find_all('td'):
            temp_data.append(data.text.encode('latin-1'))
            if 'modul' in data.text:
                break         
datatable.append(temp_data)

Answer 1

在代码中实施break运算符：

    ...
    (your code above)
datatable=[]
stop = 0
for ctable in soup.find_all('table',  "ctable" )[:-1]:
    if stop == 1:
        break
    for record in ctable.find_all('tr'):
        if stop == 1:
            break
        temp_data = []
        for data in record.find_all('td'):
            temp_data.append(data.text.encode('latin-1'))
            if 'modul' in data.text:
                stop = 1
                break         
        datatable.append(temp_data)

我没有足够的注意力，你有一个三for循环。也许现在它会起作用？

我在每个循环中添加了break。

备用if和break：

datatable=[]
stop = 0
for ctable in soup.find_all('table',  "ctable" )[:-1]:
    for record in ctable.find_all('tr'):
        temp_data = []
        for data in record.find_all('td'):
            temp_data.append(data.text.encode('latin-1'))
            if 'modul' in data.text:
                stop = 1
                break         
        datatable.append(temp_data)
        if stop == 1:
            break
    if stop == 1:
        break

当我找到值时，如何停止抓取网页数据？

1 个答案: