Question

可能重复：
Convert a dataframe to an object of class “dist” without actually calculating distances in R

我有一个非常大的csv文件（因此for循环中的for循环需要太长时间）关键字之间的相似性，当我读入data.frame看起来像：

> df   
kwd1 kwd2 similarity  
a  b  1  
b  a  1  
c  a  2  
a  c  2

我想将此转换为dist对象，如下所示：

> dObject  
  a b  
b 1    
c 2 0

我无法让这个工作： Convert a dataframe to an object of class "dist" without actually calculating distances in R

我的另一个想法是使用Matrix（）创建一个稀疏矩阵，但我不确定如何有效地填充矩阵，因为我的csv相当大 - 可能是一个apply函数？

也许reshape（）？

----更新---- 这似乎适用于上面的玩具数据集： https://stats.stackexchange.com/questions/6827/efficient-way-to-populate-matrix-in-r

但是，在这个例子中，他们使用矩阵（），但我想使用稀疏的Matrix（）作为内存原因。

---此外---- 之前有一个类似的帖子。但是，我不认为它的建议适用于这种情况，即它们不是数据集中每个元素之间的链接 - csv不包含所有关键字之间的成对相似性，如上一篇文章中所述： Convert a dataframe to an object of class "dist" without actually calculating distances in R

Answer 1

试试这个

# Generate some dummy data (since you didn't provide your data)
df <- data.frame(V1=sample(letters, 10, TRUE),
                 V2=sample(letters, 10, TRUE),
                 V3=sample(200, 10, TRUE))

df$V1和df$V2现在是因素，可能具有不同的级别，因此我们需要使它们具有可比性，例如确保"a"中的V1与"a"中的V2相同。

# Convert letters to integers
my.objects <- sort(unique(c(as.character(df$V1), as.character(df$V2))))
df$V1 <- match(df$V1, my.objects)
df$V2 <- match(df$V2, my.objects)

创建一个空距离矩阵，并在V3和V1指定的位置使用V2中的值填充该距离矩阵。最后，我们将其转换为适当的dist对象。

# Create an empty distance matrix
n <- length(my.objects)
dist.mat <- matrix(NA, n, n)
i <- as.matrix(df[-3])
dist.mat[i] <- dist.mat[i[,2:1]] <- df$V3

my.dist <- as.dist(dist.mat)

到距离对象R的距离的CSV

1 个答案: