我下面有这三个数据框:
Name<-c("jack","jack","bob","david","mary")
n1<-data.frame(Name)
Name<-c("jack","bill","dean","mary","steven")
n2<-data.frame(Name)
Name<-c("fred","alex","mary")
n3<-data.frame(Name)
我想创建一个包含3列的新数据框。所有唯一名称在第1列的所有3个源文件中都存在 列2中该文件所在的源文件的数量,以及列中所有文件中该名称实例的总数 3。
结果应该像
Name Number_of_files Number_of_instances
1 jack 2 3
2 bob 1 1
3 david 1 1
4 mary 3 3
5 bill 1 1
6 dean 1 1
7 steven 1 1
8 fred 1 1
9 alex 1 1
是否有一种自动实现所有这些目的的自动化方法?
答案 0 :(得分:4)
一种dplyr
可能是:
bind_rows(n1, n2, n3, .id = "ID") %>%
group_by(Name) %>%
summarise(Number_of_files = n_distinct(ID),
Number_of_instances = n())
Name Number_of_files Number_of_instances
<chr> <int> <int>
1 alex 1 1
2 bill 1 1
3 bob 1 1
4 david 1 1
5 dean 1 1
6 fred 1 1
7 jack 2 3
8 mary 3 3
9 steven 1 1
答案 1 :(得分:1)
从概念上讲,此答案与@tmfmnk类似,但基本版本为R
#Get names of all the objects n1, n2, n3, n4 . etc
name_df <- ls(pattern = "n\\d+")
#Combine them in one dataframe
all_df <- do.call(rbind, Map(cbind, mget(name_df), id = name_df))
#get aggregated values
aggregate(id~Name, all_df, function(x) c(length(unique(x)), length(x)))
# Name id.1 id.2
#1 bob 1 1
#2 david 1 1
#3 jack 2 3
#4 mary 3 3
#5 bill 1 1
#6 dean 1 1
#7 steven 1 1
#8 alex 1 1
#9 fred 1 1
如果需要,您可以重命名列。
为完整起见,data.table
版本
library(data.table)
dt < - rbindlist(mget(name_df), idcol = "ID")
dt[, list(Number_of_files = uniqueN(ID), Number_of_instances = .N), by = .(Name)]