如何将p标签合并为一行-beautfiulsoup

时间:2018-10-15 10:04:18

标签: python html beautifulsoup

#Extract Record
for person in soup.find_all(['b','h1']):
    with open('test.csv', 'a') as csv_file:
        writer = csv.writer(csv_file)

        #Header
        header_tag = soup.find_all('h1')[k]
        k += 1
        header = header_tag.text.strip().replace('\n', ' ').encode('windows-1252', errors='replace')
        print(header)

        #Name
        name_tag = header_tag.find_all_next('p')[1]
        name = name_tag.text.strip().replace('\n', ' ').encode('windows-1252', errors='replace')
        print(name)
        #writer.writerow([name])

        #Workplace
        workplace_tag = name_tag.find_all_next('i')[0]
        workplace = workplace_tag.text.strip().replace('\n', ' ').encode('windows-1252', errors='replace')
        print(workplace)
        #writer.writerow([workplace])

        #Abstract
        while workplace_tag.find_all_next('p')[l] != 'h1':
            abstract_tag = workplace_tag.find_all_next('p')[l]
            abstract = abstract_tag.text.strip().replace('\n', ' ').encode('windows-1252', errors='replace')
            l += 1
            print(abstract)
            #writer.writerow([abstract])

上面的代码输出了我所需要的。除了一个问题,当我尝试在Abstract代码的底部结合while循环的p标签时,我遇到了问题。

使用print(abstract, end='')不能按预期工作。

并使用此方法:

#Abstract
        abstracts = ''
        while workplace_tag.find_all_next('p')[l] != 'h1':
            abstract_tag = workplace_tag.find_all_next('p')[l]
            abstract = abstract_tag.text.strip().replace('\n', ' ').encode('windows-1252', errors='replace')
            l += 1
            abstracts += abstract.decode('windows-1252', errors='replace')
            print(abstracts)
            #writer.writerow([abstract])

这段代码几乎可以正常工作,但是它使我的while循环始终为真,因此可以无限次打印相同的第一组p标签。

当前我的代码可能得到的输出是:

Name
Workplace
Abstract A
Abstract b
Abstract c

但是我需要它看起来像:

Name
Workplace
Abstract A, Abstract b, Abstract c

0 个答案:

没有答案