如何根据数据框中的另一列在R中生成数字序列/等级?

时间:2015-10-19 21:46:38

标签: r dataframe

所以我有一个类似如下的数据框:

ID       TEXT  ReferenceTEXT  TextID  
 1        Yo        NA         NA
 2       Cool       Yup        5
 3       Nice       NA         NA
 4       Phat       Yup        5       
 5       Yup        Phat       4       
 6       Boss       NA         NA       
 7       Yay        Phat       4     

使用match作为 dataframe$TextID <- match(dataframe$ReferenceText,dataframe$Text, incomparables=NA)

我能够为TextID提取ReferenceText。现在,我想在一个名为TextID的新列下获取SequenceID的序列/等级,如下所示:

 ID       TEXT  ReferenceText  TextID  SequenceID
 1        Yo        NA         NA         NA
 2       Cool       Yup        5          5-1
 3       Nice       NA         NA         NA
 4       Phat       Yup        5          5-2
 5       Yup        Phat       4          4-1
 6       Boss       NA         NA         NA
 7       Yay        Phat       4          4-2

但是我该怎么做?完成这项任务最实际的方法是什么?对于超过160,000次观测的数据帧,需要此解决方案。

2 个答案:

答案 0 :(得分:3)

试试这个

library(dplyr)
dataframe %>% 
  group_by(ReferenceTEXT) %>% 
  mutate(SequenceID = ifelse(is.na(TextID), NA_character_, paste(TextID, seq_len(n()), sep="-")))
# Source: local data frame [7 x 5]
# Groups: ReferenceTEXT [3]
# 
# ID   TEXT ReferenceTEXT TextID     SequenceID
# (int) (fctr)        (fctr)  (int) (chr)
# 1     1     Yo            NA     NA    NA
# 2     2   Cool           Yup      5   5-1
# 3     3   Nice            NA     NA    NA
# 4     4   Phat           Yup      5   5-2
# 5     5    Yup          Phat      4   4-1
# 6     6   Boss            NA     NA    NA
# 7     7    Yay          Phat      4   4-2

答案 1 :(得分:2)

base R

df$SequenceID <- paste(df$TextID, ave(df$TextID, df$TextID, FUN=seq_along), sep="-")
is.na(df$SequenceID) <- is.na(df$TextID)
df
#   ID TEXT ReferenceTEXT TextID SequenceID
# 1  1   Yo          <NA>     NA       <NA>
# 2  2 Cool           Yup      5        5-1
# 3  3 Nice          <NA>     NA       <NA>
# 4  4 Phat           Yup      5        5-2
# 5  5  Yup          Phat      4        4-1
# 6  6 Boss          <NA>     NA       <NA>
# 7  7  Yay          Phat      4        4-2

使用ave,创建类似id的序列并将其与id粘贴在一起。然后定义正确的NA值。

<强>更新

为了更清晰一点,您可以使用transform创建新列并将其分配到一行,并根据需要删除NA字符串:

newdf <- transform(df, SequenceID = paste(TextID, ave(TextID, TextID, FUN=seq_along), sep="-"))
is.na(newdf$SequenceID) <- is.na(df$TextID)