结合python字典和稀疏矩阵

时间:2014-07-14 11:18:01

标签: python dictionary svm sparse-matrix libsvm

我为一组文本创建了一个python字典作为功能。我在python中使用libsvm进行分类,它以稀疏矩阵作为输入。输入应该是: <label> <index1>:<value1> <index2>:<value2> ...表格有没有办法将我的字典转换为libsvm格式。我试图将其保存在csv文件中并转换为稀疏矩阵格式。但是在转换为csv文件时出现了一些错误。我的字典看起来像这样:

{0: {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0,
 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0}, 1: {0: 1, 1: 1, 2: 0, 3:
 1, 4: 0, 5: 0, 6: 1, 7: 0, 8: 0, 9: 1, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 
19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0}, 2: {0: 1, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0,
 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0,
 23: 0, 24: 0, 25: 0}....}

我用来转换的代码保存在csv文件中是:

writer = csv.writer(open('/home/gew/python_docs/feature_tags.csv', 'wb'))
for key, value in features_tags.items():
    writer.writerows([key,value])

但我收到错误Error: sequence expected

这只是我在libsvm中使用的功能的一个子集,在将其转换为稀疏矩阵后,我可以通过任何方式将两个稀疏矩阵组合起来形成libsvm的特征向量。其他功能已经是稀疏矩阵形式。我使用dump_svmlight_file以libsvm格式保存tf-idf:

-1 491:0.0776333443740911 1161:0.4481444868908682 1220:0.09787659944322356 1297:0.09091558887557132
 1518:0.2558810794182657 1663:0.1000883672992121 1806:0.09191664928182955 2296:0.1493814956302894
+1 5749:0.1493814956302894 5819:0.1200273208236481 5843:0.1493814956302894 5859:0.1087982845232741
 5966:0.1076151102064468 6182:0.1730733336818764 6238:0.07999390379552077 6389:0.07944410971663282
+1 ...........

-1和+1是类标签。它总共包含大约6500个功能。如何将这两个功能组合为同一文本,以形成libsvm的特征向量。因此字典中的功能可能从6501键开始。是否有任何lib可用于将字典转换为libsvm形式并组合两个稀疏矩阵?

0 个答案:

没有答案