My test.csv file:
==================
1,54,1341775056478
2,1568,1341775056478
1,1622,1341775056498
2,3136,1341775056498
1,3190,1341775056671
2,4704,1341775056671
1,4758,1341775056693
2,6272,1341775056693
1,6326,1341775056714
2,7840,1341775056714
1,7894,1341775056735
2,9408,1341775056735
1,9462,1341775056951
2,10976,1341775056951
1,11030,1341775056972
2,12544,1341775056972
1,12598,1341775056994
2,14112,1341775056994
1,14166,1341775057014
2,15680,1341775057014
1,15734,1341775057065
2,17248,1341775057065
1,17302,1341775057087
2,18816,1341775057087
1,18870,1341775057119
2,20384,1341775057119
....
....
我正在尝试使用mahout k-means算法对这些数据进行聚类。 我遵循了以下步骤:
1)Create a sequence file from the test.csv file
mahout seqdirectory -c UTF-8 -i /user/mahout/input/test.csv -o /user/sample/out_seq -chunk 64
2)Create a sparse vector from the sequence file
mahout seq2sparse -i /user/mahout/out_seq/ -o /user/mahout/sparse_dir --maxDFPercent 85 --namedVector
3)perfom K-Means clustering
mahout kmeans -i /user/mahout/sparse_dir/tfidf-vectors/ -c /user/mahout/cluster -o /user/mahout/kmeans_out
-dm org.apache.mahout.common.distance.CosineDistanceMeasure --maxIter 10 --numClusters 20 --ow --clustering
在第3步,我面临此错误:
Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /user/mahout/text/cluster/part-randomSeed. Check your -c argument.
at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
....
....
如何克服这个错误。实际上,我使用路透社数据集成功地完成了聚类示例。但是我的数据集显示了这个问题。数据集有问题吗?或者由于其他一些问题,我是否面临这个错误?
任何人都可以就此问题向我提出建议......
先谢谢