我开始使用一个包含一列和多行的CSV文件,每行包含一个句子。 我写了一些python来删除停用词,并生成一个具有相同格式的新csv文件(1列多行句子,但现在句子中删除了停用词。)我的代码中唯一不起作用的部分是写入新的csv。
我没有把一个句子写成一列,而是有多列,其中一列中的每一行都包含一个句子的字符..
以下是我的new_text_list示例:
['"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"',
'"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"']
以下是输出csv的示例:
col1 col2
" W
W e
" W
W e
l
l
我做错了什么?
这是我的代码:
def remove_stopwords(filename):
new_text_list=[]
cachedStopWords = set(stopwords.words("english"))
with open(filename,"rU") as f:
next(f)
for line in f:
row = line.split()
text = ' '.join([word for word in row
if word not in cachedStopWords])
# print text
new_text_list.append(text)
print new_text_list
with open("output.csv",'wb') as g:
writer=csv.writer(g)
for val in new_text_list:
writer.writerows([val])
答案 0 :(得分:4)
with open("output.csv", 'wb') as g:
writer = csv.writer(g)
for item in new_text_list:
writer.writerow([item]) # writerow (singular), not writerows (plural)
或
with open("output.csv", 'wb') as g:
writer = csv.writer(g)
writer.writerows([[item] for item in new_text_list])
使用writerows
时,参数应该是行的迭代器,其中每一行都是字段值的迭代器。此处,字段值为item
。因此,行可以是列表[item]
。因此,writerows
可以将列表列表作为其参数。
writer.writerows([val])
无效,因为[val]
只是一个包含字符串的列表,而不是列表列表。
现在字符串也是序列 - 一系列字符:
In [164]: list('abc')
Out[164]: ['a', 'b', 'c']
因此writerows
将[val]
作为包含row
,val
的列表。每个字符代表一个字段值。所以你的字符串中的字符被泼溅了。例如,
import csv
with open('/tmp/out', 'wb') as f:
writer = csv.writer(f)
writer.writerows(['Hi, there'])
产量
H,i,",", ,t,h,e,r,e
答案 1 :(得分:1)
使用官方python documentation on csv。我设法编写并读取了您的示例数据 如下......
l = ['"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"',
'"Although online site asset business, still essential need reliable dependable web hosting provider. When searching suitable web host website, one name recommend. Choose plan that\'s Best Business Today! Try Now FREE 30 Days! Track sales expenses \x82"']
with open('output.csv', 'wb') as csvfile:
writer = csv.write(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for i in l:
write.writerow(i)
然后我读了下面的文件:
with open('output.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in reader:
print ''.join(row)
得到了这个输出:
“虽然在线网站资产业务,仍然需要可靠的可靠网站托管服务提供商。在搜索合适的网站主机网站时,推荐一个名称。选择今日最佳商业计划!立即免费试用30天!跟踪销售费用 ”
“虽然在线网站资产业务,仍然需要可靠的可靠网站托管服务提供商。在搜索合适的网站主机网站时,推荐一个名称。选择今日最佳商业计划!立即免费试用30天!跟踪销售费用 ”
我希望这会有所帮助......