我有像这样的数据帧/或表
RowNumber Category Value
1 . A 12
2 . A 3
3 . B 24
4. B 32
5 . B 11
6 . C 30
7 . D 2
8 . D 33
...
使用SQL(Hive)或R希望获得两种语言的指导: 根据具有不同类别的不同截止点选择记录
对于A类,我想选择值> = 10 但对于所有其他类别,B,C,D需要选择值> = 20
结果:
RowNumber Category Value
1 . A 12
3 . B 24
4. B 32
6 . C 30
8 . D 33
我怎么能这样做?
谢谢!
答案 0 :(得分:1)
在基础R中,可以使用:
完成df <- data.frame(RowNumber = c(1, 2, 3, 4, 5, 6, 7 ,8), Category = c("A", "A", "B", "B", "B", "C", "D", "D"), Value = c(12, 3, 24, 32, 11, 30, 2, 33))
df[df$Category == "A" & df$Value >= 10 | df$Category != "A" & df$Value >= 20, ]
你会得到理想的结果:
RowNumber Category Value
1 1 A 12
3 3 B 24
4 4 B 32
6 6 C 30
8 8 D 33
答案 1 :(得分:0)
以下是一些替代方案。
library(sqldf)
# 1
sqldf("select * from DF
where (Category = 'A' and Value >= 10) or (not Category = 'A' and Value >= 20)")
# 2
sqldf("select * from DF where Value >= (case when Category = 'A' then 10 else 20 end)")
# 3
sqldf("select * from DF where Value >= (10 * (not Category = 'A') + 10)")
# 4
subset(DF, (Category == "A" & Value >= 10) | (Category != "A" & Value >= 20))
# 5
subset(DF, Value >= ifelse(Category == "A", 10, 20))
# 6
subset(DF, Value >= 10 * (Category != "A") + 10)
以上任何一项都给出了:
RowNumber Category Value
1 1 A 12
2 3 B 24
3 4 B 32
4 6 C 30
5 8 D 33
可重复形式的输入是:
Lines <- "RowNumber Category Value
1 A 12
2 A 3
3 B 24
4 B 32
5 B 11
6 C 30
7 D 2
8 D 33"
DF <- read.table(text = Lines, header = TRUE)
答案 2 :(得分:0)
一个简单的查询
select c1,c2 from tbl where c2 >= 10 and c1 = 'A'
union all
select c1,c2 from tbl where c2 >= 20 and c1 != 'A'
+---------+---------+--+
| _u1.c1 | _u1.c2 |
+---------+---------+--+
| A | 12 |
| B | 24 |
| B | 32 |
| C | 30 |
| D | 33 |
+---------+---------+--+