我想获得两列之间的最小距离,但是在A列和B列中都可能出现相同的名称。
Patient1 Patient2 Distance
A B 8
A C 11
A D 19
A E 23
B F 6
C G 25
所以我需要的输出是:
Patient Patient_closest_distance Distance
A B 8
B F 6
c A 11
我尝试使用列表功能
library(data.table)
DT <- data.table(Full_data)
j1 <- DT[ , list(Distance = min(Distance)), by = Patient1]
j2 <- DT[ , list(Distance = min(Distance)), by = Patient2]
但是,我只是获得每列的最小距离,即C在两列中都将有2个结果,而不是考虑两列都显示最近的患者。另外,我只会得到一个距离列表,所以看不到哪个患者与哪个患者相关;
Patient1 SNP
1:A 8
我尝试在R Studio中使用列表功能
library(data.table)
DT <- data.table(Full_data)
j1 <- DT[ , list(Distance = min(Distance)), by = Patient1]
j2 <- DT[ , list(Distance = min(Distance)), by = Patient2]
答案 0 :(得分:1)
下面的代码有效。
# Create sample data frame
df <- data.frame(
Patient1 = c('A','B', 'A', 'A', 'C', 'B'),
Patient2 = c('B', 'A','C', 'D', 'D', 'F'),
Distance = c(10, 1, 20, 3, 60, 20)
)
# Format as character variable (instead of factor)
df$Patient1 <- as.character(df$Patient1); df$Patient2 <- as.character(df$Patient2);
# If you want mirror paths included, you'll need to add them.
# Ex.) A to C at a distance of 20 is equivalent to C to A at a distance of 20
# If you don't need these mirror paths, you can ignore these two lines.
df_mirror <- data.frame(Patient1 = df$Patient2, Patient2 = df$Patient1, Distance = df$Distance)
df <- rbind(df, df_mirror); rm(df_mirror)
# group pairs by min distance
library(dplyr)
df <- summarise(group_by(df, Patient1, Patient2), min(Distance))
# Resort, min to top.
nearest <- df[order(df$`min(Distance)`), ]
# Keep only the first of each group
nearest <- nearest[!duplicated(nearest$Patient1),]