Question

新手问题我在数据框中有2列，看起来像

我需要第三列作为序列连续运行，直到名称或大小更改值

    Name Size NewCol
    A     1   1
    A     1   2
    A     1   3
    A     2   1
    A     2   2
    B     3   1
    B     5   1
    C     7   1
    C     17  1
    C     17  2

基本上是一个虚拟字段，即使名称和大小相同，也可以单独引用每个记录。

因此，当遇到名称和大小的相同值时，索引会从k变为k + 1，否则会重置。

因此，在我的数据集中，如果我有200 A和1，则假设每个将在1..200之间编入索引。然后当它移动到A和2时，索引将重置

Answer 1

我们可以尝试data.table

library(data.table)
setDT(df1)[, NewCol := match(Size, unique(Size)), by = .(Name)]
df1
#   Name Size NewCol
#1:    A    1      1
#2:    A    1      1
#3:    A    2      2
#4:    B    3      1
#5:    C    7      1
#6:    C   17      2

如果预期输出中存在拼写错误，则可能是输出

setDT(df1)[, NewCol := seq_len(.N), .(Name, Size)]

或使用dplyr

library(dplyr)
df1 %>%
   group_by(Name) %>%
   mutate(NewCol = match(Size, unique(Size)))

或者

df1 %>%
   group_by(Name) %>%
   mutate(NewCol = row_number())

或者我们可以使用ave

中的base R使用相同的方法

Answer 2

我想这可能不是最有效的解决方案，但至少是一个好的开始：

# Reproducing the example
df <- data.frame(Name=LETTERS[c(1, 1, 1, 1, 1, 2, 2, 3, 3, 3)], Size=c(1, 1, 1, 2, 2, 3, 5, 7, 17, 17))

# Create new colum with unique id
df$NewCol <- paste0(df$Name, df$Size)

# Modify column to write count instead
df$NewCol <- unlist(sapply(unique(df$NewCol), function(id) 1:table(df$NewCol)[id]))

df
   Name Size NewCol
1     A    1      1
2     A    1      2
3     A    1      3
4     A    2      1
5     A    2      2
6     B    3      1
7     B    5      1
8     C    7      1
9     C   17      1
10    C   17      2

需要R数据框中的索引列来区分具有相同值的变量

2 个答案: