以下示例, 是(我的数据的极端例证)。
structure(list(X = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L
), .Label = c("aaa", "bbb", "burp"), class = "factor")), .Names = "X", row.names = c(NA,
-8L), class = "data.frame")
dataExample
X
1 aaa
2 burp
3 bbb
4 aaa
5 burp
6 burp
7 bbb
8 aaa
编辑:基于一列的内容(例如,'X'),我想创建一个新列(例如,'desired')告诉我,对于'X'中的每个'字符代码'列,无论我是在查看代码的第一个,第二个,还是第n个出现。此外,还有一个来自“X”的特定“代码”,需要从该计数过程中排除,并按原样表示(例如,“打嗝”)。
这是预期结果的一个例子
X desired
1 aaa 1
2 burp burp
3 bbb 1
4 aaa 2
5 burp burp
6 burp burp
7 bbb 2
8 aaa 3
注意:“期望”列表示预期结果,它不是数据集的一部分。
答案 0 :(得分:5)
以下是一些可能的解决方案:
使用Base R
df$desired <- with(df, ave(as.character(X), X, FUN = function(x) seq_len(length(x))))
df[df$X == "burp", "desired"] <- "burp"
或使用data.table
包
library(data.table)
setDT(df)[, desired := as.character(seq_len(.N)),
X][X == "burp", desired := "burp"]
或使用dplyr
包
library(dplyr)
df%>%
group_by(X) %>%
mutate(desired = ifelse(X == "burp", "burp", as.character(row_number())))
编辑:Per Op的评论,以下是所有方法都有效的说明
df$desiredBase <- with(df, ave(as.character(X), X, FUN = function(x) seq_len(length(x))))
df[df$X == "burp", "desiredBase"] <- "burp"
setDT(df)[, desiredDT := as.character(seq_len(.N)),
X][X == "burp", desiredDT := "burp"]
setDF(df) %>%
group_by(X) %>%
mutate(desiredplyr = ifelse(X == "burp", "burp", as.character(row_number())))
# Source: local data frame [8 x 4]
# Groups: X
#
# X desiredBase desiredDT desiredplyr
# 1 aaa 1 1 1
# 2 burp burp burp burp
# 3 bbb 1 1 1
# 4 aaa 2 2 2
# 5 burp burp burp burp
# 6 burp burp burp burp
# 7 bbb 2 2 2
# 8 aaa 3 3 3
数据强>
df <- structure(list(X = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L
), .Label = c("aaa", "bbb", "burp"), class = "factor")), .Names = "X", row.names = c(NA, -8L), class = "data.frame")
答案 1 :(得分:0)
根据您的有限输入:
df <- structure(list(X = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L), .Label = c("aaa", "bbb", "burp"), class = "factor"), desired = structure(c(1L,4L, 1L, 2L, 4L, 4L, 2L, 3L), .Label = c("1", "2", "3", "burp"), class = "factor")), .Names = c("X", "desired"), row.names = c(NA, -8L), class = "data.frame")
desired <- numeric(nrow(df))
set <- df$X
for(k in seq_along(set)){
cur_set <- set[1:k]
cur_el <- set[k]
tbl <- table(cur_set)
desired[k] <- ifelse(cur_el == "burp", "burp", tbl[names(tbl) == cur_el])
}
df$desired_new <- desired
> df
X desired desired_new
1 aaa 1 1
2 burp burp burp
3 bbb 1 1
4 aaa 2 2
5 burp burp burp
6 burp burp burp
7 bbb 2 2
8 aaa 3 3