如何在R中寻找相似之处?特别是,我最关心的相似性度量是余弦和KNN-#值。我想这个关键方面是为了让我的数据以可用的形状出现。
例如,使用内置的mtcars数据集,我想找到最相似的项目。
library(tidyverse)
mtcars$item = rownames(mtcars)
mtcars = mtcars %>% select(item, mpg, hp, qsec) # use these 3 fields to find similar items.
#help <here>
#desired format would be have the <N> most similar items in <N> columns indicating their respective importance
# desired format would also have the weightings of each of these items
mtcars$similar_1 = #most similar item
mtcars$similar_1_score = #.8
...
mtcars$similar_5 = #5th most similar item
mtcars$similar_5_score = #score associated with them.
我希望能够使用欧几里德距离然后单独的余弦分数,使用KNN方法再次执行此操作。
答案 0 :(得分:2)
这是一种可能的solituin,您可以使用dist()
函数计算Eucledian距离。首先,计算所有项目的距离,然后获取所有项目的订单。从该订单中选择第i个并为每个项目选择该分数和项目标签,并将其放入数据框中,然后将其绑定到原始数据框。
mtcars$item = rownames(mtcars)
data <- (mtcars %>% select(item, mpg, hp, qsec))[1:10,]
euc_dist <- as.matrix(dist(data[1:10,-1]))
# Get the ith cars label name for one car
ith_item <- function(col, euc_dist, top_i) {
labels(euc_dist)[[1]][top_i[col]]
}
# Get the ith cars score from one column
ith_score <- function(col, euc_dist, top_i) {
euc_dist[top_i[col], col]
}
# Create a dataframe with the ith most similar item for all items
ith_similar <- function(euc_dist, i) {
orders <- apply(euc_dist, 2, order)
top_i <- orders[i + 1, ]
top_i_score <- sapply(1:ncol(euc_dist), ith_score, euc_dist, top_i)
top_i_items <- sapply(1:ncol(euc_dist), ith_item, euc_dist, top_i)
similarities <- data.frame(placeholder1 = top_i_score,
placeholder2 = top_i_items)
colnames <- c(paste0("similar_", i, "_score"), paste0("similar_", i))
names(similarities) <- colnames
similarities
}
# For example top 2 similarities
n <- 2
for(i in 1:n) {
tmp_similarities <- ith_similar(euc_dist, i)
data <- cbind(data, tmp_similarities)
}
data
这将给出输出:
item mpg hp qsec similar_1_score similar_1 similar_2_score similar_2
Mazda RX4 Mazda RX4 21.0 110 16.46 0.560000 Mazda RX4 Wag 3.006726 Hornet 4 Drive
Mazda RX4 Wag Mazda RX4 Wag 21.0 110 17.02 0.560000 Mazda RX4 2.452835 Hornet 4 Drive
Datsun 710 Datsun 710 22.8 93 18.61 4.733297 Merc 230 12.987767 Valiant
Hornet 4 Drive Hornet 4 Drive 21.4 110 19.44 2.452835 Mazda RX4 Wag 3.006726 Mazda RX4
Hornet Sportabout Hornet Sportabout 18.7 175 17.02 52.018155 Merc 280 65.040680 Mazda RX4 Wag
Valiant Valiant 18.1 105 20.22 6.041391 Hornet 4 Drive 6.606815 Mazda RX4 Wag
Duster 360 Duster 360 14.3 245 15.84 70.148075 Hornet Sportabout 122.123141 Merc 280
Merc 240D Merc 240D 24.4 62 20.00 31.072369 Datsun 710 33.165796 Merc 230
Merc 230 Merc 230 22.8 95 22.90 4.733297 Datsun 710 11.369802 Valiant
Merc 280 Merc 280 19.2 123 18.30 13.186296 Mazda RX4 Wag 13.234032 Hornet 4 Drive