我有一张桌子
searchController
理想情况下,我想创建一个名为n_rates的新列,它可以按组(ID)和条件替换速率中的0值。
ID RATES
1 0.01
1 0
1 0
1 0
2 0.05
2 0.05
2 0.01
2 0
3 0
3 0
3 0
答案 0 :(得分:1)
你只需找到模式,每组的频率值最多,我在这里使用dplyr group_by
来自Ken
的功能Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(dplyr)
df1=dt[dt$RATES%in%c(0.05,0.01),]%>%group_by(ID)%>%summarise(Value=Mode(RATES))
dt=merge(dt,df1,on='ID',all.x=T)# merge back the result to original dt
dt$RATES[dt$RATES==0]=dt$Value[dt$RATES==0] # assign the value only if the RATES equal to 0
dt$RATES[is.na(dt$RATES)]=0 # fill NA back to 0
dt$Value=NULL# drop the helper column
结果
dt
ID RATES
1 1 0.01
2 1 0.01
3 1 0.01
4 1 0.01
5 2 0.05
6 2 0.05
7 2 0.05
8 2 0.01
9 3 0.00
10 3 0.00
11 3 0.00
答案 1 :(得分:1)
对于单行data.table
答案,并使用Ken's函数:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(data.table)
setDT(df)[, Rates := ifelse(Rates==0 & any(Rates!=0),
Mode(Rates[Rates!=0]), Rates), by = ID]
df
#ID Rates
#1 0.01
#1 0.01
#1 0.01
#1 0.01
#2 0.05
#2 0.05
#2 0.01
#2 0.05
#3 0.00
#3 0.00
#3 0.00
答案 2 :(得分:0)
这是%>% do(...)
myfun <- function(df) {
targets <- c(0.01, 0.015, 0.05)
if (any(unique(df$RATES) %in% targets)) {
val <- as.numeric(names(head(sort(-table(df$RATES[df$RATES > 0])), 1)))
df %>%
mutate(RATES = ifelse(RATES==0, val, RATES))
} else {
df
}
}
library(dplyr)
df %>%
group_by(ID) %>%
do(myfun(.))
# A tibble: 11 x 2
# Groups: ID [3]
# ID RATES
# <int> <dbl>
# 1 1 0.0100
# 2 1 0.0100
# 3 1 0.0100
# 4 1 0.0100
# 5 2 0.0500
# 6 2 0.0500
# 7 2 0.0100
# 8 2 0.0500
# 9 3 0.
# 10 3 0.
# 11 3 0.
数据
df <- read.table(text="ID RATES
1 0.01
1 0
1 0
1 0
2 0.05
2 0.05
2 0.01
2 0
3 0
3 0
3 0", header=TRUE)