Python:使用BeautifulSoup将内容保存为CSV

时间:2014-11-24 22:19:19

标签: python beautifulsoup

在Martijn的惊人帮助下,我在我的python编程中走了这么远。但是我试图将我的单元格的内容导出到csv文件。我成功导入了它,但我的回复如下:

import urllib2

from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read())

import csv

filename = 'Trial1.csv'

f = open(filename, 'wb')

with f:
writer = csv.writer(f)
for row in soup('table')[5].findAll('tr'):
    tds = row('td')
    result = u' '.join([cell.string for cell in tds if cell.string])
    writer.writerow(result)
    print result
f.close()

结果:| j | o | h | n | 1 | 2 | 3

而不是| john | 123 |对于每个特定的细胞。 我该如何纠正这个问题。感谢。

1 个答案:

答案 0 :(得分:0)

问题是 tds 中的单元格包含,但有些则没有,作者感到困惑。如你所知,它是csv作家(逗号分隔值)。

无论如何,只需更改分隔符就可以解决您遇到的问题,例如:

...
# I'd suggest using with ... as f as in 1 line
with open(filename, 'wb') as f:
    # set the delimiter to \t tab than comma
    writer = csv.writer(f, delimiter='\t')
    for row in soup('table')[5].findAll('tr'):
        tds = row('td')
        # you can writerow the list directly as it will convert it to string for you
        writer.writerow([cell.string for cell in tds if cell.string])
...

希望这有帮助。