Question

我是R的新手，我正在尝试获取最小距离值和相应的＆＃34; Record2_ID＆＃34;每个唯一的值＃34; Record1_ID＆＃34;以下数据框的值

Record1_ID  Record2_ID  Distance
6       10_Bil      0.95337476
6       11_Bla      0.852558044
6       12_Bon      1
6       13_Bra      1
684     78_Lip      0.957437173
684     79_Lip      1
684     80_Liv      0.950852681
684     81_Lun      0.914874347
3065        136_Pri     1
3065        137_Pro     0.895742793
3065        138_Rec     0.895742793
3065        139_Ren     0.934061953

我使用了函数tapply(x$Distance_Cosine, cosine_dist_type_data$Record1_rowID, min)，但使用tapply我没有得到＆＃34; Record2_rowID＆＃34;值。理想情况下，输出应为

Record1_ID  Record2_ID  Min_Distance
6       11_Bla      0.852558044
684     81_Lun      0.914874347
3065        137_Pro     0.895742793

可以使用sapply或任何其他功能完成此操作。谢谢你的帮助

Answer 1

或者您可以使用base函数ave

df[df$Distance == ave(df$Distance, df$Record1_ID, FUN = min), ]
#    Record1_ID Record2_ID  Distance
# 2           6     11_Bla 0.8525580
# 8         684     81_Lun 0.9148743
# 10       3065    137_Pro 0.8957428
# 11       3065    138_Rec 0.8957428

Answer 2

library(data.table)
df = data.table(read.table(header = T, text = "
            Record1_ID  Record2_ID  Distance
6       10_Bil      0.95337476
6       11_Bla      0.852558044
6       12_Bon      1
6       13_Bra      1
684     78_Lip      0.957437173
684     79_Lip      1
684     80_Liv      0.950852681
684     81_Lun      0.914874347
3065        136_Pri     1
3065        137_Pro     0.895742793
3065        138_Rec     0.895742793
3065        139_Ren     0.934061953
            "))

df[, Min_Distance := min(Distance), by = Record1_ID]
df[Distance == Min_Distance,]

或稍微直截了当：

df[, .SD[Distance == min(Distance)], by=Record1_ID]

.SD包含每个组的S ubset D ata。我们只需在与min(Distance)对应的子集上选择我们想要的行。

Answer 3

如果这是一个数据帧，你想看看plyr，特别是ddply函数。不是很优雅，但试试......

min_vals.df <- ddply(.data = df,
                     .variables = "Record1_ID",
                     .fun = function(x){
                         return(x[x$Distance == min(x$Distance),c("Record2_ID","Distance")])

Plyr和它的继任者dplyr是“申请数据框架”，迭代.variables的每个独特排列，并在结果数据上执行你想要的任何功能。

Answer 4

或没有plyr：

blah <- lapply(split(df, df["Record1_ID"]), function(x) x[which.min(x$Distance),])
min_vals.df <- do.call(rbind, blah)

（编辑）修改以包括所有分钟。值（如果有关系）

blah <- lapply(split(df, df["Record1_ID"]), function(x) subset(x, Distance==min(Distance)))
min_vals.df <- do.call(rbind, blah)

Answer 5

或dplyr：

require(dplyr)

df %.% group_by(Record1_ID) %.% filter(Distance == min(Distance))

选择数据框中最小元素的整行

5 个答案:

（编辑）修改以包括所有分钟。值（如果有关系）