我为一组文本创建了一个python字典作为功能。我在python中使用libsvm进行分类,它以稀疏矩阵作为输入。输入应该是:
<label> <index1>:<value1> <index2>:<value2> ...
表格有没有办法将我的字典转换为libsvm格式。我试图将其保存在csv文件中并转换为稀疏矩阵格式。但是在转换为csv文件时出现了一些错误。我的字典看起来像这样:
{0: {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0,
15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0}, 1: {0: 1, 1: 1, 2: 0, 3:
1, 4: 0, 5: 0, 6: 1, 7: 0, 8: 0, 9: 1, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0,
19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0}, 2: {0: 1, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0,
8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0,
23: 0, 24: 0, 25: 0}....}
我用来转换的代码保存在csv文件中是:
writer = csv.writer(open('/home/gew/python_docs/feature_tags.csv', 'wb'))
for key, value in features_tags.items():
writer.writerows([key,value])
但我收到错误Error: sequence expected
这只是我在libsvm中使用的功能的一个子集,在将其转换为稀疏矩阵后,我可以通过任何方式将两个稀疏矩阵组合起来形成libsvm的特征向量。其他功能已经是稀疏矩阵形式。我使用dump_svmlight_file以libsvm格式保存tf-idf:
-1 491:0.0776333443740911 1161:0.4481444868908682 1220:0.09787659944322356 1297:0.09091558887557132
1518:0.2558810794182657 1663:0.1000883672992121 1806:0.09191664928182955 2296:0.1493814956302894
+1 5749:0.1493814956302894 5819:0.1200273208236481 5843:0.1493814956302894 5859:0.1087982845232741
5966:0.1076151102064468 6182:0.1730733336818764 6238:0.07999390379552077 6389:0.07944410971663282
+1 ...........
-1和+1是类标签。它总共包含大约6500个功能。如何将这两个功能组合为同一文本,以形成libsvm的特征向量。因此字典中的功能可能从6501键开始。是否有任何lib可用于将字典转换为libsvm形式并组合两个稀疏矩阵?