我遇到了一些不应该难以解决的问题。我想要做的是使用另一个data.frame
对data.frame
进行子集化,更确切地说,使用某个参数对其进行子集化。
这是一个例子:
df1<- t(data.frame(A=c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), B=c("0.5","3","0","0","5","0","15"), C=c("0","0","3","15","15","0","0"), D=c("0.5","0.5","0.5","0","0","0","0"), E=c("37.5","37.5","0.5","62.5","0.5","0.5","1")))
df2<- data.frame(A=c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), B=c("vasc", "vasc","vasc","spha", "moss","moss","moss"), C=c("a", "a", "b", "a", "c","d","a"))
现在,让我说我想在我的df1
中只想在我的df1中df2中的“vasc”对象A(这里是物种)。
为此,我尝试了一些例如:
df3 <- subset(df2, B=="vasc")
df4 <- df1[,c(df1, as.vector(df2))]
但是这样做,我的错误类型为:
df1 [,c(df1,as.vector(df2))]中的错误:无效的下标类型'list'
因此,我试图取消我的数据框,但似乎没有任何效果。我已经解决了这个问题一段时间了,我确实在探索论坛,看看是否有人对我的问题有一个优雅的解决方案,但看起来不是。 执行此子集化的另一种方法是执行以下一些代码,但即使我感觉更接近解决方案,它也无法正常工作:
try11 <- list(df2, df1)%>% rbindlist(., fill=T) # with df1 not transposed
df11 <- try11[try11=="vasc",]
我希望代码足够好,我的解释足够明确。 谢谢!
答案 0 :(得分:0)
您可以尝试:
library(data.table)
setDT(df1)
setDT(df2)
dtPruned <- df1[A %in% df2[B == "vasc", A]]
请确保删除df1定义中的t()调用,以使其正常工作。基本上,它正在做的是选择df2中的A列,其中B =“vasc”。然后它从df1中选择行,其中A在df2中的A中。
答案 1 :(得分:0)
您可以使用dplyr
library(dplyr)
species <- as.character(df2[df2$B == "vasc",1])
df1 %>%
slice(A %in% species)
## A tibble: 3 x 5
# A B C D E
# <fct> <fct> <fct> <fct> <fct>
#1 ABI 0.5 0 0.5 37.5
#2 ABI 0.5 0 0.5 37.5
#3 ABI 0.5 0 0.5 37.5
您的数据仅包含factor
。也许你想要使用数字作为numeric
类。
答案 2 :(得分:0)
这应该这样做。首先,我们在x
中的A
中创建所有B == vasc
值的字符向量(df2
)。然后,我们从df1
A == x
:
# Create a character vector of all A values when B == vasc
x <- as.character(df2[df2$B == "vasc", 1])
# Select columns where row A == x
df1[, which(df1[1, ] %in% x)]
[,1] [,2] [,3] A "ABI" "BET" "ALN" B "0.5" "3" "0" C "0" "0" "3" D "0.5" "0.5" "0.5" E "37.5" "37.5" "0.5"
如果我们避开t
来电,我们可以这样做:
df1[df1$A %in% df2[df2$B == "vasc", 1], ]
A B C D E 1 ABI 0.5 0 0.5 37.5 2 BET 3 0 0.5 37.5 3 ALN 0 3 0.5 0.5
我们可以转置数据框以保持与上面相同的格式:
t(df1[df1$A %in% df2[df2$B == "vasc", 1], ])
1 2 3 A "ABI" "BET" "ALN" B "0.5" "3" "0" C "0" "0" "3" D "0.5" "0.5" "0.5" E "37.5" "37.5" "0.5"
数据:
df1 <- t(data.frame(
A = c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"),
B = c("0.5","3","0","0","5","0","15"),
C = c("0","0","3","15","15","0","0"),
D = c("0.5","0.5","0.5","0","0","0","0"),
E = c("37.5","37.5","0.5","62.5","0.5","0.5","1")
)
)
df2 <- data.frame(
A = c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"),
B = c("vasc", "vasc","vasc","spha", "moss","moss","moss"),
C = c("a", "a", "b", "a", "c","d","a")
)