根据数据帧中特殊模式的频率分配新列的值

时间:2018-07-20 15:22:07

标签: r dataframe dplyr

我想创建数据框的另一列,以根据顺序将第一列中的每个成员分组。

这是一个可复制的演示:

df1=c("Alex","23","ID #:123", "John","26","ID #:564")
df1=data.frame(df1)
library(dplyr)
library(data.table)
df1 %>% mutate(group= ifelse(df1 %like% "ID #:",1,NA ) )

这是演示的输出:

df1        group
1     Alex    NA
2       23    NA
3 ID #:123     1
4     John    NA
5       26    NA
6 ID #:564     1

这就是我想要的:

 df1         group
 1     Alex     1
 2       23     1
 3 ID #:123     1
 4     John     2
 5       26     2
 6 ID #:564     2

所以我想在组列中按顺序指示每个成员。

对于任何答复或想法,我都表示感谢!

2 个答案:

答案 0 :(得分:1)

先用lag转移条件,然后再进行cumsum

df1 %>% 
    mutate(group= cumsum(lag(df1 %like% "ID #:", default = 1)))

#       df1 group
#1     Alex     1
#2       23     1
#3 ID #:123     1
#4     John     2
#5       26     2
#6 ID #:564     2

详细信息:

df1 %>% 
    mutate(
        # calculate the condition
        cond = df1 %like% "ID #:", 
        # shift the condition down and fill the first value with 1
        lag_cond = lag(cond, default = 1),
        # increase the group when the condition is TRUE (ID encountered)
        group= cumsum(lag_cond))

#       df1  cond lag_cond group
#1     Alex FALSE     TRUE     1
#2       23 FALSE    FALSE     1
#3 ID #:123  TRUE    FALSE     1
#4     John FALSE     TRUE     2
#5       26 FALSE    FALSE     2
#6 ID #:564  TRUE    FALSE     2

答案 1 :(得分:1)

您没有提到您是否总是希望每个成员3行。此代码将允许您切换每个成员的行数(以防万一不一定是3):

# Your code:
df1=c("Alex","23","ID #:123", "John","26","ID #:564")
df1=data.frame(df1)
library(dplyr)
library(data.table)
df1 %>% mutate(group= ifelse(df1 %like% "ID #:",1,NA ) )

number_of_rows_per_member <- 3 # Change if necessary
positions <- 1:(nrow(df1)/number_of_rows_per_member)

group <- c()
for (i in 1:length(positions)) { 
  group[(i*number_of_rows_per_member):((i*number_of_rows_per_member)-(number_of_rows_per_member-1))] <- i
  }
group # This is the group column

df1$group <- group # Now just move the group coloumn into your original dataframe
df1 # Done!