Question

我正在尝试将一个URL列表放入一个csv文件中，我正在使用urllib2和BeautifulSoup从网页上抓取这些文件。我尝试将链接写入csv文件作为unicode，并转换为utf-8。在这两种情况下，每个字母都插入到一个新字段中。

这是我的代码（我至少尝试过这两种方式）：

f = open('filename','wb')
w = csv.writer(f,delimiter=',')
for link in links:
    w.writerow(link['href'])

和

f = open('filename','wb')
w = csv.writer(f,delimiter=',')
for link in links:
    w.writerow(link['href'].encode('utf-8'))

links是一个如下所示的列表：

[<a href="#Flyout1" accesskey="2" class="quicklinks" tabindex="1" title="Skip to content">Quick Links: Skip to main page content</a>, <a href="#search" class="quicklinks" tabindex="1" title="Skip to search">Skip to Search</a>, <a href="#News" class="quicklinks" tabindex="1" title="Skip to Section table of contents">Skip to Section Content Menu</a>, <a href="#footer" class="quicklinks" tabindex="1" title="Skip to site options">Skip to Common Links</a>, <a href="http://www.hhs.gov"><img src="/ucm/groups/fdagov-public/@system/documents/system/img_fdagov_hhs_gov.png" alt="www.hhs.gov link" style="width:112px; height:18px;" border="0" /></a>]

并非所有链接都有'href'密钥，但我在此处未显示的代码中检查了该密钥。在这两种情况下，正确的字符串都写入csv文件，但每个字母都在新字段中。

有什么想法吗？

Answer 1

来自docs：“一行必须是字符串或数字的序列...”您传递的是单个字符串，而不是字符串序列，因此它将每个字母视为一个项目。把你的字符串放在一个列表中。

所以将w.writerow(link['href'])更改为w.writerow([link['href']])。

注意：具有单列的csv文件看起来与平面文本文件完全相同。也许你不需要csv。

Answer 2

我认为“每个字母插入一个新字段”你的意思是这样的，对吗？

h,t,t,p,:,/,/,w,w,w,.,g,o,o,g,l,e,.,c,o,m

如果是这样，那么writerow()将迭代字符串中的字符，并将其解释为不同的列。请尝试使用writerow([link['href']])。

编辑：看起来@Steven Rumbalski打败了我！＃/ p>

Answer 3

根据the docs，writerow()获取一个可迭代对象，并在其上迭代，打印出它的CSV表示。你的问题是字符串是一个可迭代的对象。如果我有：

mystring = 'foo'

Python会让我像这样迭代：

for c in mystring:
    print c

我会得到：

f
o
o

这是一个方便的功能，但在这种情况下它会对你不利。

您不希望writerow()通过字符串进行迭代，您希望它通过字符串列表进行迭代 - 用逗号分隔字符串而不是人物。在这种情况下，你会想要从字符串中列出一个列表：

w.writerow([link['href']])

Python CSV将每个字母放在新字段中的问题

3 个答案: