R或SQL选择具有不同类别的不同值的记录

时间:2018-01-16 14:19:18

标签: sql r select hive subset

我有像这样的数据帧/或表

RowNumber   Category    Value 
   1 .        A          12
   2 .        A           3 
   3 .        B          24     
   4.         B          32
   5 .        B          11
   6 .        C          30   
   7 .        D           2
   8 .        D          33

...

使用SQL(Hive)或R希望获得两种语言的指导: 根据具有不同类别的不同截止点选择记录

对于A类,我想选择值> = 10 但对于所有其他类别,B,C,D需要选择值> = 20

结果:

RowNumber   Category    Value 
   1 .        A          12
   3 .        B          24     
   4.         B          32
   6 .        C          30   
   8 .        D          33   

我怎么能这样做?

谢谢!

3 个答案:

答案 0 :(得分:1)

在基础R中,可以使用:

完成
df <- data.frame(RowNumber = c(1, 2, 3, 4, 5, 6, 7 ,8), Category = c("A", "A", "B", "B", "B", "C", "D", "D"), Value = c(12, 3, 24, 32, 11, 30, 2, 33))
df[df$Category == "A" & df$Value >= 10 | df$Category != "A" & df$Value >= 20, ]

你会得到理想的结果:

    RowNumber Category  Value
1         1        A    12
3         3        B    24
4         4        B    32
6         6        C    30
8         8        D    33

答案 1 :(得分:0)

以下是一些替代方案。

library(sqldf)

# 1
sqldf("select * from DF 
       where (Category = 'A' and Value >= 10) or (not Category = 'A' and Value >= 20)")

# 2
sqldf("select * from DF where Value >= (case when Category = 'A' then 10 else 20 end)")

# 3
sqldf("select * from DF where Value >= (10 * (not Category = 'A') + 10)")

# 4
subset(DF, (Category == "A" & Value >= 10) | (Category != "A" & Value >= 20))

# 5
subset(DF, Value >= ifelse(Category == "A", 10, 20))

# 6
subset(DF, Value >= 10 * (Category != "A") + 10)

以上任何一项都给出了:

  RowNumber Category Value
1         1        A    12
2         3        B    24
3         4        B    32
4         6        C    30
5         8        D    33

注意

可重复形式的输入是:

Lines <- "RowNumber   Category    Value 
   1        A          12
   2        A           3 
   3        B          24     
   4        B          32
   5        B          11
   6        C          30   
   7        D           2
   8        D          33"

DF <- read.table(text = Lines,  header = TRUE)

答案 2 :(得分:0)

一个简单的查询

select c1,c2 from tbl where c2 >= 10 and c1 = 'A' 
union all
select c1,c2 from tbl where c2 >= 20 and c1 != 'A' 

+---------+---------+--+
| _u1.c1  | _u1.c2  |
+---------+---------+--+
| A       | 12      |
| B       | 24      |
| B       | 32      |
| C       | 30      |
| D       | 33      |
+---------+---------+--+