我有一个包含三列的数据框。第一和第二列包含地点名称,第三列包含值。有50个独特的地方。我想在第1列和第2列中搜索相似的组合,并添加相应的值。例如,类似的组合包括第1列中的VillageA和第2列中的VillageD,反之亦然(第1列中的VillageD和第2列中的VillageA)。
在R中有最简单的方法吗?
可复制的示例:
# simpler switch
(test.description <- switch(
EXPR = test.type,
"p" = "Student's t-test",
"np" = "Durbin-Conover test"
))
#> [1] "Student's t-test"
预期结果
value<-rnorm(6,0.5)
from<-c("VillageA","VillageB","VillageC","VillageD", "VillageB","VillageD")
to<-c("VillageD","VillageC", "VillageB","VillageA","VillageD","VillageB")
df<-data.frame(from,to,value)
df
from to value
1 VillageA VillageD 1.8903532567673
2 VillageB VillageC 0.868595180019032
3 VillageC VillageB 1.47556560739867
4 VillageD VillageA 1.09236209542305
5 VillageB VillageD 1.17212213945941
6 VillageD VillageB 1.8903532567673
没有确定的组合(AB或B–A)。
答案 0 :(得分:0)
将因子转换为character
,通过将字母顺序排列的第一个村庄放在一个村庄中,而字母顺序排列的最后一个村庄在另一个村庄中,然后按组求和,以一致的顺序创建新列。这是一个base
解决方案,您也可以使用data.table
或dplyr
,see the Sum by Group R-FAQ了解其他方法的详细信息。
df$from = as.character(df$from)
df$to = as.character(df$to)
df$a = pmin(df$from, df$to)
df$b = pmax(df$from, df$to)
aggregate(value ~ a + b, data = df, FUN = sum)
# a b value
# 1 VillageB VillageC 0.6702636
# 2 VillageA VillageD 1.6532692
# 3 VillageB VillageD -1.2560672
答案 1 :(得分:0)
library(tidyverse)
value<-rnorm(6,0.5)
from<-c("VillageA","VillageB","VillageC","VillageD", "VillageB","VillageD")
to<-c("VillageD","VillageC", "VillageB","VillageA","VillageD","VillageB")
我修复了data.frame:
df<-data.frame(from,to,value,stringsAsFactors = FALSE)
之后,我们可以使用dplyr进行计算:
df %>% mutate(min=pmin(from,to),max=pmax(from,to)) %>%
group_by(min,max) %>%
summarise(sum_value=sum(value))