我有一个非常简单的代码来尝试Cosine Similarity:
function CRC(data, length)
sum = 65535
local d
for i = 1, length do
d = string.byte(data, i) -- get i-th element, like data[i] in C
sum = ByteCRC(sum, d)
end
return sum
end
function ByteCRC(sum, data)
sum = sum ~ data
for i = 0, 7 do -- lua for loop includes upper bound, so 7, not 8
if ((sum & 1) == 0) then
sum = sum >> 1
else
sum = (sum >> 1) ~ 0xA001 -- it is integer, no need for string func
end
end
return sum
end
print(CRC("foo", 3));
我在拥有Spark 1.5的Amazon AWS上运行此代码但是我在最后两行收到以下消息: “Erroe:value columnSimilarities不是org.apache.spark.rdd.RDD [(int,int)]的成员”
您能帮忙解决这个问题吗?
答案 0 :(得分:2)
我找到了答案。我需要将矩阵转换为RDD。这是正确的代码:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.linalg.distributed.{MatrixEntry, CoordinateMatrix, RowMatrix}
import org.apache.spark.rdd._
import org.apache.spark.mllib.linalg._
def matrixToRDD(m: Matrix): RDD[Vector] = {
val columns = m.toArray.grouped(m.numRows)
val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
val vectors = rows.map(row => new DenseVector(row.toArray))
sc.parallelize(vectors)
}
val dm: Matrix = Matrices.dense(5, 5,Array(1,2,3,4,5,1,2,3,4,5,1,2,4,5,8,3,4,1,2,7,7,7,7,7,7))
val rows = matrixToRDD(dm)
val mat = new RowMatrix(rows)
val simsPerfect = mat.columnSimilarities()
val simsEstimate = mat.columnSimilarities(0.8)
println("Pairwise similarities are: " + simsPerfect.entries.collect.mkString(", "))
println("Estimated pairwise similarities are: " + simsEstimate.entries.collect.mkString(", "))
干杯