您好我正在使用sklearn并使用kmeans进行自然语言处理,我使用Kmeans从注释创建集群,然后创建一个字典,其中集群的数量为Key,以及与值关联的注释列表如下:
dict_clusters = {}
for i in range(0,len(kmeans.labels_)):
#print(kmeans.labels_[i])
#print(listComments[i])
if not kmeans.labels_[i] in dict_clusters:
dict_clusters[kmeans.labels_[i]] = []
dict_clusters[kmeans.labels_[i]].append(listComments[i])
print("dictionary constructed")
我想用我试过的这本词典写一个csv:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows(dict_clusters)
Out.close()
然而我不确定为什么会出错,因为我收到以下错误,此外我不确定此错误是否与numpy有关,因为kmeans.labels_包含多个值,
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 133, in <module>
w.writerows(dict_clusters)
File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
File "C:\Program Files\Anaconda3\lib\csv.py", line 146, in _dict_to_list
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
TypeError: 'numpy.int32' object is not iterable
我想感谢对此的支持,我希望用我的字典获得一个csv如下:
key1, value
key2, value
.
.
.
keyN, value
经过反馈,我试过了:
with open("dictionary.csv", mode="wb") as out_file:
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
writer.writerow(dict_clusters)
我得到了:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 129, in <module>
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
TypeError: __init__() missing 1 required positional argument: 'fieldnames'
ATTEMPT2:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()
输出:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 130, in <module>
w.writerows([dict_clusters])
File "C:\Program Files\Anaconda3\lib\csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
TypeError: a bytes-like object is required, not 'str'
尝试3,这种尝试需要花费大量时间来计算输出:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()
我正在使用的python版本如下:
3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
3.5.2
经过多次尝试后,我决定使用更好的方法来构建我的字典,如下所示:
from collections import defaultdict
pairs = zip(y_pred, listComments)
dict_clusters2 = defaultdict(list)
for num, comment in pairs:
dict_clusters2[num].append(comment)
然而,似乎某些角色无法创建csv文件,如下所示:
with open('dict.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in dict_clusters2.items():
writer.writerow([key, value])
输出:
Traceback (most recent call last):
File "C:/Users/CleanFile.py", line 146, in <module>
writer.writerow([key, value])
File "C:\Program Files\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f609' in position 6056: character maps to <undefined>
为了更清楚,我表演了:
for k,v in dict_clusters2.items():
print(k, v)
我有类似的东西:
1 ['hello this is','the car is red',....'performing test']
2 ['we already have','another comment',...'strings strings']
.
.
19 ['we have',' comment music',...'strings strings dance']
我的字典有一个密钥和几个评论的列表我希望有一个csv如下:
1,'hello this is','the car is red',....'performing test'
2,'we already have','another comment',...'strings strings'
.
.
19,'we have',' comment music',...'strings strings dance'
然而,似乎有些角色没有很好地应对,一切都失败了,我想得到支持,感谢你的支持。
答案 0 :(得分:2)
writerows
method必须列出词典列表:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerows([dict_clusters])
Out.close()
你可能正在寻找带有单个字典对象的writerow
:
Out = open("dictionary.csv", "wb")
w = csv.DictWriter(Out,dict_clusters.keys())
w.writerow(dict_clusters)
Out.close()
除此之外:您可能还想考虑使用open
作为上下文管理器(在with
块中)以确保文件正确关闭:
with open("dictionary.csv", mode="wb") as out_file:
writer = csv.DictWriter(out_file, headers=dict_clusters.keys())
writer.writerow(dict_clusters)
答案 1 :(得分:2)
你的特殊角色,在Py3 Ipython会话中渲染为:
In [31]: '\U0001f609'
Out[31]: ''
给我们一个字典的小样本,或者更好的是你用来构建字典的值。
我还没有和csv
一起工作,csv.DictWriter
甚至更少。 numpy
个用户经常使用csv
撰写np.savetxt
个文件。在编写纯数字数组时,这很容易使用。如果你想编写一组混合的字符和数字列,那就很苛刻,需要使用结构化数组。
另一种选择是直接编写文本文件。只需打开它,然后使用f.write(...)
将格式化的行写入文件。实际上np.savetxt
基本上就是这样:
with open(filename, 'w') as f:
for row in myArray:
f.write(fmt % tuple(row))
savetxt
构建fmt
字符串,如%s, %d, %f\n
。它也适用于字节串,需要wb
模式。因此,你的特殊角色可能会遇到更多问题。
可能有助于专注于打印字典,一次一个键,例如
for k in mydict.keys():
print(`%s, %s`%(k, mydict[k]))
开始。一旦获得print
格式,就可以很容易地将其转换为文件写入。
===============
我可以用你的代码写一个假设的字典:
In [58]: adict={1:'\U0001f609'}
In [59]: with open('test.txt','w') as f:
...: writer=csv.writer(f)
...: for k,v in adict.items():
...: writer.writerow([k,v])
...:
In [60]: cat test.txt
1,