我需要根据用户是否至少完成一次操作来创建新列。
USER ACTION
A Attack
A Jump
B Attack
B Die
C Attack
C Die
C Jump
D Die
期望的结果将是:
## If ACTION == something
## Create new column and apply '1' for that user for all rows
USER ACTION HAS_DIED HAS_JUMPED HAS_ATTACKED
A Attack 0 1 1
A Jump 0 1 1
B Attack 1 0 1
B Die 1 0 1
C Attack 1 1 1
C Die 1 1 1
C Jump 1 1 1
D Die 1 0 0
所以我最终可以得到一个唯一的USER列表
USER HAS_DIED HAS_JUMPED HAS_ATTACKED
A 0 1 1
B 1 0 1
C 1 1 1
D 1 0 0
我一直在使用以下方法对每个功能进行过滤和合并,但这会使大量功能变得繁琐。例)
## mark logs of deaths
df[ACTION == "Die", HAS_DIED := 1]
## get unique list of users that have died
died_df <- df[HAS_DIED == 1]
## merge and change none 1s to 0s
merged_df <- died_df[df, on = "USER"]
merged_df$HAS_DIED[is.na(merged_df$HAS_DIED)] <- 0
寻找更快,更有效的方法来实现这一目标!
答案 0 :(得分:2)
由于初始对象为data.table
,我们可以使用dcast
中的data.table
并且效率非常高
library(data.table)
setnames(dcast(setDT(df1), USER ~ACTION, length), -1,
c('HAS_ATTACKED', 'HAS_DIED', 'HAS_JUMPED'))[]
# USER HAS_ATTACKED HAS_DIED HAS_JUMPED
#1: A 1 0 1
#2: B 1 1 0
#3: C 1 1 1
#4: D 0 1 0
答案 1 :(得分:1)
使用dplyr
和tidyr
:
df %>%
mutate(n=1) %>%
spread(ACTION, n, fill=0) %>%
setNames(c('USER', 'HAS_ATTACKED', 'HAS_DIED', 'HAS_JUMPED'))
# USER HAS_ATTACKED HAS_DIED HAS_JUMPED
# 1 A 1 0 1
# 2 B 1 1 0
# 3 C 1 1 1
# 4 D 0 1 0