我有20个数据框(dat.table1到dat.table20),如下所示:
> dat.table1
Mean SD LB UB
1 -3.251915678 0.09831336 -3.44979982 -3.0579865
2 0.529393596 0.09403571 0.34492156 0.7138352
3 0.437666296 0.09555116 0.25218768 0.6230282
4 0.386773612 0.09338021 0.20630132 0.5708987
5 0.259218892 0.10023005 0.06538325 0.4610775
6 -0.048387041 0.07875680 -0.20517662 0.1020621
7 0.086933460 0.08688864 -0.08462830 0.2565562
8 0.206235709 0.08200178 0.04710170 0.3658142
9 0.343474976 0.08204759 0.18539931 0.5062159
10 -0.354694572 0.08556581 -0.52609169 -0.1916891
11 -0.270542304 0.07349095 -0.41319234 -0.1291315
12 0.124547080 0.08323933 -0.04331230 0.2836064
13 0.005354652 0.10487004 -0.20677503 0.2061523
14 0.296131787 0.08235691 0.13605602 0.4593168
15 0.246056104 0.07536908 0.09803849 0.3959664
16 0.271052276 0.08347047 0.10437983 0.4354910
17 -0.005474416 0.09352408 -0.19415321 0.1736560
> dat.table2
Mean SD LB UB
1 -3.32373198 0.10477638 -3.53563786 -3.1241599
2 0.58316739 0.09466424 0.39814125 0.7690037
3 0.47869295 0.09768017 0.28395734 0.6701996
4 0.44479756 0.09489120 0.26172536 0.6336547
5 0.30072454 0.09964341 0.10674064 0.4980277
6 -0.05397720 0.07987092 -0.20952979 0.1038290
7 0.06624190 0.08466350 -0.10406855 0.2297836
8 0.18411601 0.07997405 0.02953943 0.3433614
9 0.35256600 0.07871029 0.20079165 0.5111548
10 -0.39566218 0.08567173 -0.56842809 -0.2281193
11 -0.29250153 0.07652253 -0.44428227 -0.1435696
12 0.07428006 0.08742497 -0.09829608 0.2419713
13 -0.03926006 0.11335154 -0.26894891 0.1716172
14 0.30625276 0.08212213 0.14760732 0.4674057
15 0.26511644 0.07824379 0.11330060 0.4216398
16 0.25476552 0.08699879 0.08646282 0.4240095
17 -0.05081449 0.10151042 -0.25162773 0.1451824
我的问题是如何从所有数据框中选择特定行(比如第1行)并将它们按新数据框中的行组合?
感谢。
答案 0 :(得分:4)
最好在list
中读取数据集,而不是在全局环境中创建/读取20
数据集,然后执行这些操作。已经创建了datasets
,您可以这样做:
lst <- mget(ls(pattern='^dat.table\\d+'))
res <- do.call(`rbind`,lapply(lst,function(x) x[1,]))
row.names(res) <- NULL
对于two
数据集,您将获得
res
# Mean SD LB UB
#1 -3.251916 0.09831336 -3.449800 -3.057987
#2 -3.323732 0.10477638 -3.535638 -3.124160
另一种选择是使用slice
dplyr
library(dplyr)
library(tidyr)
d1 <- unnest(lst, grp)
group_by(d1, grp) %>%
slice(1)
# grp Mean SD LB UB
#1 dat.table1 -3.251916 0.09831336 -3.449800 -3.057987
#2 dat.table2 -3.323732 0.10477638 -3.535638 -3.124160
或使用data.table
library(data.table)
rbindlist(Map(cbind, grp=seq_along(lst), lst))[, head(.SD,1), by=grp]
# grp Mean SD LB UB
#1: 1 -3.251916 0.09831336 -3.449800 -3.057987
#2: 2 -3.323732 0.10477638 -3.535638 -3.124160
重新发送错误消息,我怀疑任何column names
元素中的lst
都不同。例如,如果我改变
colnames(lst[[1]])[1] <- "Mean1"
do.call(`rbind`,lapply(lst,function(x) x[1,]))
#Error in match.names(clabs, names(xi)) :
#names do not match previous names
如果为每个数据集按类似排序列,则一个选项是将列名更改为相同
nm1 <- sapply(lst, function(x) colnames(x))[,2] #Because I changed the 1st element
#column name
lst1 <- lapply(lst, function(x) {colnames(x) <- nm1; x} )
res <- do.call(`rbind`,lapply(lst1,function(x) x[1,]))
row.names(res) <- NULL
答案 1 :(得分:3)
如果你想避免从一开始就有20个类似命名的数据框......你可以这样做:
file_names <- list.files(pattern = "\\.csv")
read_file <- function(x) {df <- read.csv(x, stringsAsFactors = FALSE); df$file = x; df}
file_list <- lapply(files, read_file)
combined <- do.call(rbind, file_list)
看起来像这样......
> head(combined)
mpg cyl disp hp drat wt qsec vs am gear carb file
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 file1.csv
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 file1.csv
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 file1.csv
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 file1.csv
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 file1.csv
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 file1.csv
> tail(combined)
mpg cyl disp hp drat wt qsec vs am gear carb file
91 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2 file20.csv
92 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2 file20.csv
93 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4 file20.csv
94 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6 file20.csv
95 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8 file20.csv
96 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2 file20.csv
list.files
直接搜索您的工作(默认情况下),搜索以.csv结尾的文件。
read_file
函数会在给定其路径的文件中读取,并添加一列来说明它来自哪个文件。
lapply
然后会在read_file
file_names
函数
do.call
会将上面返回的数据帧列表合并到一个数据框中。