在Martijn的惊人帮助下,我在我的python编程中走了这么远。但是我试图将我的单元格的内容导出到csv文件。我成功导入了它,但我的回复如下:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read())
import csv
filename = 'Trial1.csv'
f = open(filename, 'wb')
with f:
writer = csv.writer(f)
for row in soup('table')[5].findAll('tr'):
tds = row('td')
result = u' '.join([cell.string for cell in tds if cell.string])
writer.writerow(result)
print result
f.close()
结果:| j | o | h | n | 1 | 2 | 3
而不是| john | 123 |对于每个特定的细胞。 我该如何纠正这个问题。感谢。
答案 0 :(得分:0)
问题是 tds 中的单元格包含,
但有些则没有,作者感到困惑。如你所知,它是csv作家(逗号分隔值)。
无论如何,只需更改分隔符就可以解决您遇到的问题,例如:
...
# I'd suggest using with ... as f as in 1 line
with open(filename, 'wb') as f:
# set the delimiter to \t tab than comma
writer = csv.writer(f, delimiter='\t')
for row in soup('table')[5].findAll('tr'):
tds = row('td')
# you can writerow the list directly as it will convert it to string for you
writer.writerow([cell.string for cell in tds if cell.string])
...
希望这有帮助。