我有一张包含约8000个观测值和65个变量的表格。我有另一张表,其中有35个观测值和11个变量。
较大的表格如下所示: portion of the larger table
,小表看起来像这样: portion of the smaller table
如您所见,较小的表的第一列包含较大表的一些列名。我怎么能比简单地写出我想要选择哪些列更紧凑,让R创建一个表,在较大的表中只有较小的表中指定的列中的数据?
非常感谢任何帮助!
更新: 感谢回答者的数据。我想知道是否有可能匹配large.df中列的顺序与名称出现在small.df中的顺序
large.df <- data.frame(A=rnorm(5), B=abs(rnorm(5, sd=0.08)),
C=rnorm(5), D=abs(rnorm(5, sd=0.08)))
A B C D
1 0.2367193 0.002297593 -0.1958682 0.03877595
2 -1.2419638 0.034031808 0.3253622 0.02578829
3 -0.2718915 0.188627689 0.4844783 0.04405741
4 -0.6587699 0.024045926 -1.1209473 0.09849541
5 1.7890422 0.055520325 0.1093612 0.11637796
samll.df <- data.frame(Category = c("B","D"))
samll.df
Category
1 D
2 B
我希望输出的列有'D','B',而不是'B','D'。我的例子有~35列,所以比按所需顺序键入列名更紧凑的方式是理想的。谢谢
答案 0 :(得分:1)
使用%in%
> a <- data.frame(A=1:10,B=11:20,C=1:10) # Small data frame
> b <- data.frame(A=1:10,D=11:20,C=21:30,E=41:50) # Big data frame
# Column names common are A and C
> R <- b[,names(b) %in% names(a)]
> R
A C
1 1 21
2 2 22
3 3 23
4 4 24
5 5 25
6 6 26
7 7 27
8 8 28
9 9 29
10 10 30
答案 1 :(得分:0)
cols.small_table<-as.character(samll.df$Category)
解决方案:1#与small.df
# order columns in large.df based on cols.small_table and subset data
large.df[ ,match(cols.keep, names(large.df))]
D B
1 0.0007403109 0.080096733
2 0.0528159794 0.045623426
3 0.0327912984 0.038420719
4 0.0976794958 0.108335834
5 0.0974624753 0.008220431
解决方案2
# Keep the columns in large table based on match in small table
large.df[ , which(names(large.df) %in% cols.small_table)]
B D
1 0.002297593 0.03877595
2 0.034031808 0.02578829
3 0.188627689 0.04405741
4 0.024045926 0.09849541
5 0.055520325 0.11637796
# Remove the columns in large table based on match in small table
large.df[ , -which(names(large.df) %in% cols.small_table)]
A C
1 0.2367193 -0.1958682
2 -1.2419638 0.3253622
3 -0.2718915 0.4844783
4 -0.6587699 -1.1209473
5 1.7890422 0.1093612
数据
large.df <- data.frame(A=rnorm(5), B=abs(rnorm(5, sd=0.08)),
C=rnorm(5), D=abs(rnorm(5, sd=0.08)))
A B C D
1 0.2367193 0.002297593 -0.1958682 0.03877595
2 -1.2419638 0.034031808 0.3253622 0.02578829
3 -0.2718915 0.188627689 0.4844783 0.04405741
4 -0.6587699 0.024045926 -1.1209473 0.09849541
5 1.7890422 0.055520325 0.1093612 0.11637796
samll.df <- data.frame(Category = c("D","B"))
samll.df
Category
1 D
2 B