根据多个条件进行过滤?

时间:2018-08-30 06:05:41

标签: r dplyr

是否有一种方法可以针对列(因子)中多个级别的多个条件进行过滤?

Individual<-c("a1.2", "a1.2","bd3.d","bd3.d", "k20.d","k20.d", "dfd.2","dfd.2", "d3.d","d3.d", "df3.1","df3.1")
Treat <- c('hot','hot','hot','hot','hot','hot','cold',"cold",'cold',"cold",'cold',"cold")
Time <- c("T1", "T9", "T1", "T9","T1", "T9","T1", "T9","T1", "T9","T1", "T9")
Area<- c("0.1", "0.5", "0", "0.645","0.1", "0","0.1", "0.587","0", "0.78","0.23", "0.78")
df.Area22 <- data.frame(Individual, Treat,Time,Area)
head(df.Area22, n=20)

    Individual Treat Time  Area
1        a1.2   hot   T1   0.1
2        a1.2   hot   T9   0.5
3       bd3.d   hot   T1     0
4       bd3.d   hot   T9 0.645
5       k20.d   hot   T1   0.1
6       k20.d   hot   T9     0
7       dfd.2  cold   T1   0.1
8       dfd.2  cold   T9 0.587
9        d3.d  cold   T1     0
10       d3.d  cold   T9  0.78
11      df3.1  cold   T1  0.23
12      df3.1  cold   T9  0.78

例如,我只想从Individual列中选择Area T1和T9的Time值都大于零的个人?

该函数因此将删除第3、6和9行。

谢谢!

3 个答案:

答案 0 :(得分:1)

首先,您必须将Area变量转换为数字,因为R已将其解释为因子变量。

如果使用as.numeric,则会丢失小数位。因此,您必须使用as.numericlevels来确保保留小数位数。

接着,您将T1和T9的过滤器应用于时间变量,并将值大于0的变量应用于区域变量。

library(dplyr)

df.Area22$Area <- as.numeric(levels(df.Area22$Area))[df.Area22$Area]

df <- df.Area22 %>%
  filter((Time == "T1" | Time == "T9") & Area > 0)

最终结果就是您所需要的(删除第3、6和9行)。

df

  Individual Treat Time  Area
1       a1.2   hot   T1 0.100
2       a1.2   hot   T9 0.500
3      bd3.d   hot   T9 0.645
4      k20.d   hot   T1 0.100
5      dfd.2  cold   T1 0.100
6      dfd.2  cold   T9 0.587
7       d3.d  cold   T9 0.780
8      df3.1  cold   T1 0.230
9      df3.1  cold   T9 0.780

答案 1 :(得分:1)

我猜想诀窍是在stringsAsFactors=FALSE函数中设置data.frame

library(dplyr)
Individual<-c("a1.2", "a1.2","bd3.d","bd3.d", "k20.d","k20.d", "dfd.2","dfd.2", "d3.d","d3.d", "df3.1","df3.1")
Treat <- c('hot','hot','hot','hot','hot','hot','cold',"cold",'cold',"cold",'cold',"cold")
Time <- c("T1", "T9", "T1", "T9","T1", "T9","T1", "T9","T1", "T9","T1", "T9")
Area<- c("0.1", "0.5", "0", "0.645","0.1", "0","0.1", "0.587","0", "0.78","0.23", "0.78")
df.Area22 <- data.frame(Individual, Treat,Time,Area, stringsAsFactors=FALSE)
head(df.Area22, n=20)
df.Area22 %>%
  filter(Time %in% c('T1', 'T9'), 
         Area > 0)

filter中,您可以仅使用,添加多个过滤器命令,将其视为&

答案 2 :(得分:0)

Base-R解决方案。根据@demariod的建议,您需要在443函数中使用stringsAsFactors=FALSE

data.frame

输出:

 # Select the time T1 or T9 and Area >0
df[(df$Time=='T1' | df$Time=='T9') & df$Area>0,]