我有一个数据表:
ID FREQUENCY
"jso" 3
"and" 2
"jso" 3
"mo" 1
"jso" 3
"and" 2
它有一个频率列。但是,我想创建一个表,其中包含id到目前为止出现的次数。所以我希望我的数据表看起来像这样:
ID FREQUENCY
"jso" 1
"and" 1
"jso" 2
"mo" 1
"jso" 3
"and" 2
你会怎么做?
答案 0 :(得分:1)
这可以通过分组操作来完成。使用data.table
,将'data.frame'转换为'data.table'(setDT(df1)
),按'ID'分组,我们得到行序列(seq_len(.N)
)并分配( :=
)它为'FREQUENCY'
library(data.table)
setDT(df1)[,FREQUENCY := seq_len(.N) , by = ID]
或者使用dplyr
,row_number()
是行序列的便捷功能(按“ID”分组后。
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(FREQUENCY = row_number())
或base R
with(df1, ave(FREQUENCY, ID, FUN = seq_along))
#[1] 1 1 2 1 3 2
df1 <- structure(list(ID = c("jso", "and", "jso", "mo", "jso", "and"
), FREQUENCY = c(3L, 2L, 3L, 1L, 3L, 2L)), .Names = c("ID", "FREQUENCY"
), class = "data.frame", row.names = c(NA, -6L))