Question

我有html，例如：

h2
span
span
span
h2
span
span
h2
span
span
span
span

我想将它保存到excel文件，所以我写了那个一次得到h2 / span标签的循环：

for item in soup.find_all(re.compile(r'^(h2|span)$'), {'class': re.compile(r'^(product-name|attribute-value)$')}):

如何保存h2，然后跨越同一行直到下一个h2出现，并将其保存到下一行等等。我正在使用openpyxl for .xlsx文件。

它应该是这样的：

h2 span span span
h2 span span
h2 span span span span

Answer 1

不使用find_all，而是使用find_next迭代标记。根据标签，可以执行相关操作

    wb = Workbook()
    ws = wb.active

    tagiterator = soup.h2

    row, col = 1, 1
    ws.cell(row=row, column=col, value=tagiterator.getText())
    tagiterator = tagiterator.find_next()
    while tagiterator.find_next():
        if tagiterator.name == 'h2':
            # Go to the next line
            row += 1
            col = 1
            ws.cell(row=row, column=col, value=tagiterator.getText())
        elif tagiterator.name == 'span':
            # Go to the next column
            col += 1
            ws.cell(row=row, column=col, value=tagiterator.getText())
        tagiterator = tagiterator.find_next()
    wb.save('sample.xlsx')

保存值，直到出现特定标记

1 个答案: