有没有一种方法可以根据共同的列值(id
)将行分组在一起,然后根据每个组中是否包含新的ID(new.id
)来对新列进行突变。值是否高于和/或低于1000?如:
< 1000 = "low/low"
(该组中的所有值均低于1000)< 1000 and > 1000 = "low/high"
(其中一些在1000以下并在1000以上)> 1000 = "high/high"
(所有值均大于1000)数据
#Example
id values
1 a 200
2 a 300
3 b 100
4 b 2000
5 b 3000
6 c 4000
7 c 2000
8 c 3000
9 d 2400
10 d 2000
11 d 400
#dataframe:
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "c",
"d", "d", "d"), values = c(200, 300, 100, 2000, 3000, 4000, 2000,
3000, 2400, 2000, 400)), class = "data.frame", row.names = c(NA,
-11L))
所需的输出
id values new.id
1 a 200 low/low
2 a 300 low/low
3 b 100 low/high
4 b 2000 low/high
5 b 3000 low/high
6 c 4000 high/high
7 c 2000 high/high
8 c 3000 high/high
9 d 2400 low/high
10 d 2000 low/high
11 d 400 low/high
dplyr
解决方案将是很好的选择,但对其他任何人都开放!
答案 0 :(得分:0)
df['result']=pd.cut(df.start, [-np.inf, 0, 250,np.inf], labels=['unacceptablelow','acceptable', 'unacceptablehigh'])
group start end diff percent date \
A 2019-04-01 2019-05-01 -160 -11 04-01-2019 to 05-01-2019
2019-05-01 2019-06-01 136 8 05-01-2019 to 06-01-2019
B 2020-06-01 2020-07-01 202 5 06-01-2020 to 07-01-2020
2020-07-01 2020-08-01 283 7 07-01-2020 to 08-01-2020
result
A 2019-04-01 unacceptablelow
2019-05-01 acceptable
B 2020-06-01 acceptable
2020-07-01 unacceptablehigh
答案 1 :(得分:0)
或者,您可以使用 dplyr 中的 recode 功能。
df %>% group_by(id) %>%
mutate(
new.id = dplyr::recode(
sum(values > 1000) / length(values),
`0` = "low/low",
`1` = "high/high",
.default = "low/high"
)
)
如果您还希望保留总数,则
df %>% group_by(id) %>%
add_tally() %>%
mutate(new.id = dplyr::recode(
sum(values > 1000) / n,
`0` = "low/low",
`1` = "high/high",
.default = "low/high"
))