在数据框中搜索两列

时间:2018-11-08 15:52:17

标签: r dataframe

我有一个包含三列的数据框。第一和第二列包含地点名称,第三列包含值。有50个独特的地方。我想在第1列和第2列中搜索相似的组合,并添加相应的值。例如,类似的组合包括第1列中的VillageA和第2列中的VillageD,反之亦然(第1列中的VillageD和第2列中的VillageA)。

在R中有最简单的方法吗?

可复制的示例:

# simpler switch
(test.description <- switch(
  EXPR = test.type,
  "p"  = "Student's t-test",
  "np" = "Durbin-Conover test"
))
#> [1] "Student's t-test"

预期结果

value<-rnorm(6,0.5)
from<-c("VillageA","VillageB","VillageC","VillageD", "VillageB","VillageD")
to<-c("VillageD","VillageC", "VillageB","VillageA","VillageD","VillageB")
df<-data.frame(from,to,value)
df

   from       to             value
1 VillageA VillageD   1.8903532567673
2 VillageB VillageC 0.868595180019032
3 VillageC VillageB  1.47556560739867
4 VillageD VillageA  1.09236209542305
5 VillageB VillageD  1.17212213945941
6 VillageD VillageB   1.8903532567673

没有确定的组合(AB或B–A)。

2 个答案:

答案 0 :(得分:0)

将因子转换为character,通过将字母顺序排列的第一个村庄放在一个村庄中,而字母顺序排列的最后一个村庄在另一个村庄中,然后按组求和,以一致的顺序创建新列。这是一个base解决方案,您也可以使用data.tabledplyrsee the Sum by Group R-FAQ了解其他方法的详细信息。

df$from = as.character(df$from)
df$to = as.character(df$to)

df$a = pmin(df$from, df$to)
df$b = pmax(df$from, df$to)
aggregate(value ~ a + b, data = df, FUN = sum)
#          a        b      value
# 1 VillageB VillageC  0.6702636
# 2 VillageA VillageD  1.6532692
# 3 VillageB VillageD -1.2560672

答案 1 :(得分:0)

library(tidyverse)
value<-rnorm(6,0.5)
from<-c("VillageA","VillageB","VillageC","VillageD", "VillageB","VillageD")
to<-c("VillageD","VillageC", "VillageB","VillageA","VillageD","VillageB")

我修复了data.frame:

df<-data.frame(from,to,value,stringsAsFactors = FALSE)

之后,我们可以使用dplyr进行计算:

df %>% mutate(min=pmin(from,to),max=pmax(from,to)) %>% 
  group_by(min,max) %>% 
  summarise(sum_value=sum(value))