使用另一个数据帧子集数据帧

时间:2018-03-20 14:21:20

标签: r dataframe subset

我遇到了一些不应该难以解决的问题。我想要做的是使用另一个data.framedata.frame进行子集化,更确切地说,使用某个参数对其进行子集化。 这是一个例子:

df1<- t(data.frame(A=c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), B=c("0.5","3","0","0","5","0","15"), C=c("0","0","3","15","15","0","0"), D=c("0.5","0.5","0.5","0","0","0","0"), E=c("37.5","37.5","0.5","62.5","0.5","0.5","1")))
df2<- data.frame(A=c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), B=c("vasc", "vasc","vasc","spha", "moss","moss","moss"), C=c("a", "a", "b", "a", "c","d","a"))

现在,让我说我想在我的df1中只想在我的df1中df2中的“vasc”对象A(这里是物种)。 为此,我尝试了一些例如:

df3 <- subset(df2, B=="vasc")
df4 <- df1[,c(df1, as.vector(df2))]

但是这样做,我的错误类型为:

  

df1 [,c(df1,as.vector(df2))]中的错误:无效的下标类型'list'

因此,我试图取消我的数据框,但似乎没有任何效果。我已经解决了这个问题一段时间了,我确实在探索论坛,看看是否有人对我的问题有一个优雅的解决方案,但看起来不是。 执行此子集化的另一种方法是执行以下一些代码,但即使我感觉更接近解决方案,它也无法正常工作:

 try11 <- list(df2, df1)%>% rbindlist(., fill=T)  # with df1 not transposed
 df11 <- try11[try11=="vasc",]

我希望代码足够好,我的解释足够明确。 谢谢!

3 个答案:

答案 0 :(得分:0)

您可以尝试:

library(data.table)
setDT(df1)
setDT(df2)

dtPruned <- df1[A %in% df2[B == "vasc", A]]

请确保删除df1定义中的t()调用,以使其正常工作。基本上,它正在做的是选择df2中的A列,其中B =“vasc”。然后它从df1中选择行,其中A在df2中的A中。

答案 1 :(得分:0)

您可以使用dplyr

执行此操作
library(dplyr)
species <- as.character(df2[df2$B == "vasc",1])

df1 %>% 
    slice(A %in% species)

## A tibble: 3 x 5
#  A     B     C     D     E
#  <fct> <fct> <fct> <fct> <fct>
#1 ABI   0.5   0     0.5   37.5
#2 ABI   0.5   0     0.5   37.5
#3 ABI   0.5   0     0.5   37.5

PS

您的数据仅包含factor。也许你想要使用数字作为numeric类。

答案 2 :(得分:0)

这应该这样做。首先,我们在x中的A中创建所有B == vasc值的字符向量(df2)。然后,我们从df1 A == x

中选择列
# Create a character vector of all A values when B == vasc
x <- as.character(df2[df2$B == "vasc", 1])

# Select columns where row A == x
df1[, which(df1[1, ] %in% x)]
  [,1]   [,2]   [,3] 
A "ABI"  "BET"  "ALN"
B "0.5"  "3"    "0"  
C "0"    "0"    "3"  
D "0.5"  "0.5"  "0.5"
E "37.5" "37.5" "0.5"

如果我们避开t来电,我们可以这样做:

df1[df1$A %in% df2[df2$B == "vasc", 1], ]
    A   B C   D    E
1 ABI 0.5 0 0.5 37.5
2 BET   3 0 0.5 37.5
3 ALN   0 3 0.5  0.5

我们可以转置数据框以保持与上面相同的格式:

t(df1[df1$A %in% df2[df2$B == "vasc", 1], ])
  1      2      3    
A "ABI"  "BET"  "ALN"
B "0.5"  "3"    "0"  
C "0"    "0"    "3"  
D "0.5"  "0.5"  "0.5"
E "37.5" "37.5" "0.5"

数据:

df1 <- t(data.frame(
  A = c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), 
  B = c("0.5","3","0","0","5","0","15"), 
  C = c("0","0","3","15","15","0","0"), 
  D = c("0.5","0.5","0.5","0","0","0","0"), 
  E = c("37.5","37.5","0.5","62.5","0.5","0.5","1")
  )
)

df2 <- data.frame(
  A = c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), 
  B = c("vasc", "vasc","vasc","spha", "moss","moss","moss"), 
  C = c("a", "a", "b", "a", "c","d","a")
)