使用beautifulsoup4和urllib3抓取表格html多页

时间:2019-09-04 07:16:13

标签: python beautifulsoup scrape urllib3

请帮助我, 我编写的代码仅适用于1页,我希望所有页面都适用。我该怎么办?

import csv 
import urllib3
from bs4 import BeautifulSoup


outfile = open("data.csv","w",newline='')
    writer = csv.writer(outfile)


    for i in range(1,20) :
            url = f'http://ciumi.com/cspos/barcode-ritel.php?page={i}'
            req = urllib3.PoolManager()
            res = req.request('GET', url)
            tree = BeautifulSoup(res.data, 'html.parser')  
            table_tag = tree.select("table")[0]
    tab_data = [[item.text for item in row_data.select("th,td")]
                    for row_data in table_tag.select("tr")]

    for data in tab_data:
        writer.writerow(data)
        print( res, url, ' '.join(data))

1 个答案:

答案 0 :(得分:0)

您的代码运行良好,如果您要废弃所有uri并从中获取数据,则只需正确缩进即可:

import csv
import urllib3
from bs4 import BeautifulSoup


outfile = open("data.csv","w",newline='')
writer = csv.writer(outfile)

for i in range(1,20) :
    url = f'http://ciumi.com/cspos/barcode-ritel.php?page={i}'
    req = urllib3.PoolManager()
    res = req.request('GET', url)
    tree = BeautifulSoup(res.data, 'html.parser')
    table_tag = tree.select("table")[0]
    tab_data = [[item.text for item in row_data.select("th,td")] for row_data in table_tag.select("tr")]
    for data in tab_data:
        writer.writerow(data)
        print( res, url, ' '.join(data))

但是您必须清理数据才能获得漂亮的csv文件