我有一个看起来像这样的表:
在表格的下方,Target.Country
中的国家/地区会重复Source.Country
,因此会重复这些组合,但会有不同的数字,总和和方法。是否可能在组合相同时,将剩余的列加在一起并添加一列以查找平均值?
例如:
Source.Country Target.Country number sum_intensity mean_intensity
North Korea South Korea 26492 10674.9 0.402
South Korea North Korea 34912 53848.3 1.542
成为:
Source.Country Target.Country number sum_intensity mean_intensity Average
North Korea South Korea 61404 64523.2 1.944 1.05
任何帮助都会很棒!
答案 0 :(得分:0)
@Axeman在评论中提出的类似解决方案:
library(purrr)
library(dplyr)
df=data.frame(Source.Country=c('North Korea', 'South Korea'),
Target.Country=c('South Korea', 'North Korea'),
number=c(26492, 34912),
sum_intensity=c(10674.9, 53848.3),
mean_intensity=c(0.402, 1.542))
df %>% mutate(grp = purrr::map2_chr(Source.Country, Target.Country, ~paste(sort(c(as.character(.x), as.character(.y))), collapse=' '))) %>%
group_by(grp) %>%
summarise(number = sum(number),
sum_intensity = sum(sum_intensity),
mean_intensity = sum(mean_intensity),
average = sum_intensity/number)
# # A tibble: 1 x 5
# grp number sum_intensity mean_intensity average
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 North Korea South Korea 61404. 64523. 1.94 1.05
一些小调整:
collapse
命令paste
as.character
以防止国家/地区名称被强制转换为整数mean_intensity
无法用作摘要中的输出,然后作为输入,但无论如何number
不平衡时,平均值的平均值并没有多大意义。我只是从总和中重新计算了平均值答案 1 :(得分:0)
df1<-rbind(c( "North Korea ","South Korea" , 26492 , 10674.9 ,
0.402), c( "South Korea", "North Korea" , 34912 , 53848.3 , 1.542),
c( "Canada ","South Korea" , 26492 , 10674.9 , 0.402),
c( "South Korea", "Canada" , 34912 , 53848.3 , 1.542))
colnames(df1)<-c("Source.Country", "Target.Country", "number", "sum_intensity",
"mean_intensity")
df1<-data.frame(df1)
df1$number<-as.numeric(as.character(df1$number))
df1$sum_intensity<-as.numeric(as.character(df1$sum_intensity))
df1$mean_intensity<-as.numeric(as.character(df1$mean_intensity))
df1$Countries<-apply(cbind(df1$Source.Country, df1$Target.Country), 1, function(x)
paste(sort(x), collapse=" "))
#
library(reshape)
m1 <- aggregate(df1$number~df1$Countries,data=df1,FUN=mean)
m2 <- aggregate(df1$sum_intensity~df1$Countries,data=df1,FUN=mean)
m3 <- aggregate(df1$mean_intensity~df1$Countries,data=df1,FUN=mean)
mvtab <- merge(rename(m1,c(y="number")),
rename(m2,c(y="sum_intensity")))
mtab2<-merge(mvtab, rename(m3,c(y="mean_intensity")))