Question

我正在使用的代码：

import urllib2
import csv
from bs4 import BeautifulSoup

url = "http://en.wikipedia.org/wiki/List_of_ongoing_armed_conflicts"
soup = BeautifulSoup(urllib2.urlopen(url))

fl = open('locations.csv', 'w')

def unique(countries):
    seen = set()
    for country in countries:
        l = country.lower()
        if l in seen:
            continue
        seen.add(l)
        yield country


locs = []
for row in soup.select('table.wikitable tr'):
    cells = row.find_all('td')
    if cells:
        for location in cells[3].find_all(text=True):
            locs.extend(location.split())

locs2 = []            
for locations in unique(locs):
    locations = locs2.extend(locations.split())
print sorted(locs2)

writer = csv.writer(fl)
writer.writerow(['location'])
for values in sorted(locs2):
    writer.writerow(values)

fl.close()

当我打印我正在编写的代码时，我在每个元素前面得到一个u'，我认为这就是为什么它以这种方式输出。我尝试使用.strip(u'')，但它给了我一个.strip无法使用的错误，因为它是一个列表。我做错了什么？

Answer 1

locs2是一个包含字符串的列表，而不是列表列表。因此，您尝试将单个字符串写为一行：

for values in sorted(locs2):
    writer.writerow(values)

此处values是一个字符串，writerow()将其视为一个序列。传递给该函数的任何序列的每个元素都将被视为一个单独的列。

如果您想将所有位置都写为一个行，请将整个列表传递给writer.writerow()：

writer.writerow(sorted(locs2))

如果您想为每个位置写一个新行，请先将其包装在一个列表中：

for location in sorted(locs2):
    writer.writerow([location])

您不需要从字符串中添加u前缀;只是Python告诉你你有 Unicode 字符串对象，而不是字节串对象：

>>> 'ASCII byte string'
'ASCII byte string'
>>> 'ASCII unicode string'.decode('ascii')
u'ASCII unicode string'

如果您想了解有关Python和Unicode的更多信息，请参阅以下信息：

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Pragmatic Unicode
Python Unicode HOWTO

写入csv文件时，为什么列中的每个字母都是？

1 个答案: