在R中的新列中标记副本

时间:2015-01-24 22:50:08

标签: r

以下示例, 是(我的数据的极端例证)。

structure(list(X = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L
), .Label = c("aaa", "bbb", "burp"), class = "factor")), .Names = "X", row.names = c(NA, 
-8L), class = "data.frame")

dataExample
     X
1  aaa
2 burp
3  bbb
4  aaa
5 burp
6 burp
7  bbb
8  aaa

编辑:基于一列的内容(例如,'X'),我想创建一个新列(例如,'desired')告诉我,对于'X'中的每个'字符代码'列,无论我是在查看代码的第一个,第二个,还是第n个出现。此外,还有一个来自“X”的特定“代码”,需要从该计数过程中排除,并按原样表示(例如,“打嗝”)。

这是预期结果的一个例子

    X     desired
1  aaa       1
2 burp      burp
3  bbb       1
4  aaa       2
5 burp      burp
6 burp      burp
7  bbb       2
8  aaa       3

注意:“期望”列表示预期结果,它不是数据集的一部分。

2 个答案:

答案 0 :(得分:5)

以下是一些可能的解决方案:

使用Base R

df$desired <- with(df, ave(as.character(X), X, FUN = function(x) seq_len(length(x))))
df[df$X == "burp", "desired"] <- "burp"

或使用data.table

library(data.table)
setDT(df)[, desired := as.character(seq_len(.N)), 
                                 X][X == "burp", desired := "burp"]

或使用dplyr

library(dplyr)
df%>%
  group_by(X) %>%
  mutate(desired = ifelse(X == "burp", "burp", as.character(row_number())))

编辑:Per Op的评论,以下是所有方法都有效的说明

df$desiredBase <- with(df, ave(as.character(X), X, FUN = function(x) seq_len(length(x))))
df[df$X == "burp", "desiredBase"] <- "burp"

setDT(df)[, desiredDT := as.character(seq_len(.N)), 
          X][X == "burp", desiredDT := "burp"]

setDF(df) %>%
  group_by(X) %>%
  mutate(desiredplyr = ifelse(X == "burp", "burp", as.character(row_number())))

# Source: local data frame [8 x 4]
# Groups: X
# 
#      X desiredBase desiredDT desiredplyr
# 1  aaa           1         1           1
# 2 burp        burp      burp        burp
# 3  bbb           1         1           1
# 4  aaa           2         2           2
# 5 burp        burp      burp        burp
# 6 burp        burp      burp        burp
# 7  bbb           2         2           2
# 8  aaa           3         3           3

数据

df <- structure(list(X = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L
      ), .Label = c("aaa", "bbb", "burp"), class = "factor")), .Names = "X", row.names = c(NA, -8L), class = "data.frame")

答案 1 :(得分:0)

根据您的有限输入:

df <- structure(list(X = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L), .Label = c("aaa", "bbb", "burp"), class = "factor"), desired = structure(c(1L,4L, 1L, 2L, 4L, 4L, 2L, 3L), .Label = c("1", "2", "3", "burp"), class = "factor")), .Names = c("X", "desired"), row.names = c(NA, -8L), class = "data.frame")

desired <- numeric(nrow(df))
set <- df$X
for(k in seq_along(set)){
  cur_set <- set[1:k]
  cur_el <- set[k]
  tbl <- table(cur_set)
  desired[k] <- ifelse(cur_el == "burp", "burp", tbl[names(tbl) == cur_el])
}

df$desired_new <- desired
> df
     X desired desired_new
1  aaa       1           1
2 burp    burp        burp
3  bbb       1           1
4  aaa       2           2
5 burp    burp        burp
6 burp    burp        burp
7  bbb       2           2
8  aaa       3           3