所以我有一个类似如下的数据框:
ID TEXT ReferenceTEXT TextID
1 Yo NA NA
2 Cool Yup 5
3 Nice NA NA
4 Phat Yup 5
5 Yup Phat 4
6 Boss NA NA
7 Yay Phat 4
使用match
作为
dataframe$TextID <- match(dataframe$ReferenceText,dataframe$Text, incomparables=NA)
我能够为TextID
提取ReferenceText
。现在,我想在一个名为TextID
的新列下获取SequenceID
的序列/等级,如下所示:
ID TEXT ReferenceText TextID SequenceID
1 Yo NA NA NA
2 Cool Yup 5 5-1
3 Nice NA NA NA
4 Phat Yup 5 5-2
5 Yup Phat 4 4-1
6 Boss NA NA NA
7 Yay Phat 4 4-2
但是我该怎么做?完成这项任务最实际的方法是什么?对于超过160,000次观测的数据帧,需要此解决方案。
答案 0 :(得分:3)
试试这个
library(dplyr)
dataframe %>%
group_by(ReferenceTEXT) %>%
mutate(SequenceID = ifelse(is.na(TextID), NA_character_, paste(TextID, seq_len(n()), sep="-")))
# Source: local data frame [7 x 5]
# Groups: ReferenceTEXT [3]
#
# ID TEXT ReferenceTEXT TextID SequenceID
# (int) (fctr) (fctr) (int) (chr)
# 1 1 Yo NA NA NA
# 2 2 Cool Yup 5 5-1
# 3 3 Nice NA NA NA
# 4 4 Phat Yup 5 5-2
# 5 5 Yup Phat 4 4-1
# 6 6 Boss NA NA NA
# 7 7 Yay Phat 4 4-2
答案 1 :(得分:2)
在base R
:
df$SequenceID <- paste(df$TextID, ave(df$TextID, df$TextID, FUN=seq_along), sep="-")
is.na(df$SequenceID) <- is.na(df$TextID)
df
# ID TEXT ReferenceTEXT TextID SequenceID
# 1 1 Yo <NA> NA <NA>
# 2 2 Cool Yup 5 5-1
# 3 3 Nice <NA> NA <NA>
# 4 4 Phat Yup 5 5-2
# 5 5 Yup Phat 4 4-1
# 6 6 Boss <NA> NA <NA>
# 7 7 Yay Phat 4 4-2
使用ave
,创建类似id的序列并将其与id粘贴在一起。然后定义正确的NA
值。
<强>更新强>
为了更清晰一点,您可以使用transform
创建新列并将其分配到一行,并根据需要删除NA字符串:
newdf <- transform(df, SequenceID = paste(TextID, ave(TextID, TextID, FUN=seq_along), sep="-"))
is.na(newdf$SequenceID) <- is.na(df$TextID)