Question

假设我的数据框看起来像这样：

>df
city  year  ceep
  1    1      1
  1    2      1
  1    3      0
  1    4      1
  1    5      0
  2    1      0
  2    2      1
  2    3      1
  2    4      0
  2    5      1
  3    1      1
  3    2      0
  3    3      1
  3    4      0
  3    5      1

现在我想创建一个新变量'veep'，它取决于不同行中'city'和'ceep'的值。例如，

veep=1 if ceep[_n-1]=1 & city=city[_n-1]
veep=1 if ceep[_n+2]=1 & ceep[_n+3]=1 & city=city[_n+3]

其中n是观察行。我不确定如何将这些条件转换为R语言。我猜我遇到麻烦的地方是选择一排观察。我正在考虑一个代码：

df$veep[df$ceep(of the n-1th observation)==1 & city==city(n-1th observ.)] <- 1
df$veep[df$ceep(of the n+2th observation)==1 & df$ceep(of the n+3th observation)==1 &
city==city(n+3th observ.)] <- 1

#note: what's in parentheses is just to demonstrate where I'm having trouble

有人可以提供帮助吗？

Answer 1

这是一种写出逻辑步骤的方法。注意使用idx来索引向量。这对于避免超出范围的索引是必要的。

idx <- seq_len(nrow(df))

# Set a default value for the new variable
df$veep <- NA

您的第一组逻辑条件无法应用于df的第一行，因为索引n - 1将为0，并且这不是有效的行索引。因此，使用tail(*, -1)选择除veep和city的所有条目以外的所有条目，并使用head(*, -1)挑选除ceep和{{}之外的所有条目{1}}。

city

您的下一组条件无法应用于df[tail(idx, -1), "veep"] <- ifelse( head(df$ceep, -1) == 1 & tail(df$city, -1) == head(df$city, -1), 1, tail(df$veep, -1))的最后三行，因为df将是无效索引。因此，请再次使用n + 3和head函数。一个棘手的部分是第一个tail语句基于ceep，而不是n + 2，因此需要n + 3和head的组合。

tail

Answer 2

您可以像这样使用for循环

df$veep <- 0   

for (i in seq(nrow(df))){
 if (i > 1 & i < nrow(df)-2){
    if (df[i-1,"ceep"]==1 & df[i-1,"city"] == df[i,"city"])
       df[i,"veep"] <- 1
 }
}

在R中生成一个新变量，其中第n个观察值取决于另一个列的第n-1次观察

2 个答案: