将因子变量相对于另一个向上移动一行

时间:2017-02-20 20:06:22

标签: r data-cleaning

不确定如何弄清楚这一点。

以下是一个示例数据集:

 Bob <- sample("Bob", 6, replace = T)
Jeff <- sample("Jeff", 6, replace = T)
Carl <- sample("Carl", 6, replace = T)
Name <- array(c(Bob, Jeff, Carl), dim = c(18,1))
Week <- c("Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Week 6",
        "Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Week 6",
        "Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Week 6")

variable.1 <- c("No", "No", "No", "Yes", "No", "No",
            "Yes", "No", "No", "No", "Yes", "No",
            "No", "Yes", "No", "No", "No", "Yes")

df <- data.frame(Name, Week, variable.1)
df

   Name   Week variable.1
1   Bob Week 1         No
2   Bob Week 2         No
3   Bob Week 3         No
4   Bob Week 4        Yes
5   Bob Week 5         No
6   Bob Week 6         No
7  Jeff Week 1        Yes
8  Jeff Week 2         No
9  Jeff Week 3         No
10 Jeff Week 4         No
11 Jeff Week 5        Yes
12 Jeff Week 6         No
13 Carl Week 1         No
14 Carl Week 2        Yes
15 Carl Week 3         No
16 Carl Week 4         No
17 Carl Week 5         No
18 Carl Week 6        Yes

我想做的是移动任何&#34;是&#34;在变量1列中排成一行,以便它可以反映为前一周信息的因子变量。我试图通过个人(而不是整个数据集)来做这件事。当两个变量都是因素时,我无法找出解决这个问题的最佳方法。理想情况下,我想要NA出现。我不希望一切都变得简单。我只想让NA出现在&#34;是&#34;是的,并且它覆盖了&#34; No&#34;在它之上。

所以,我理想地喜欢完成的产品就像&#34; New.Col&#34;下面:

   Name   Week variable.1 New.Col
1   Bob Week 1         No      No
2   Bob Week 2         No      No
3   Bob Week 3         No     Yes
4   Bob Week 4        Yes      NA
5   Bob Week 5         No      No
6   Bob Week 6         No      No
7  Jeff Week 1        Yes      NA
8  Jeff Week 2         No      No
9  Jeff Week 3         No      No
10 Jeff Week 4         No     Yes
11 Jeff Week 5        Yes      NA
12 Jeff Week 6         No      No
13 Carl Week 1         No     Yes
14 Carl Week 2        Yes      NA
15 Carl Week 3         No      No
16 Carl Week 4         No      No
17 Carl Week 5         No     Yes
18 Carl Week 6        Yes      NA

1 个答案:

答案 0 :(得分:1)

试试吧。

我将继续按名称和周排序df,以防某些数据出现故障。 (这不会导致任何错过的周数!)我还会在variable.1中复制newcol作为字符进行播放。

df <- df[order(df$Name, df$Week),]
df$newcol <- as.character(df$variable.1)

为了便于理解,我会写一个循环,但是计算,有更好的方法来做到这一点。此循环将查看df $ Name

中的每个唯一人物
for (person in unique(df$Name)) {

}

在循环中,我想为每个人选择newcol中的所有条目。

oldvalues <- df[df$Name == person, ]$newcol

然后我继续将每个值向上移动1个条目并将最后一个条目设为NA。

newvalues <- c(oldvalues[2:length(oldvalues)], NA)

我还想考虑每次旧值是&#34;是&#34;通过使那个星期NA。

newvalues[oldvalues == "Yes"] <- NA

然后我可以把它放回df

df[df$Name == person,]$newcol <- newvalues

现在循环已完成,您可以将df$newcol恢复为默认情况下排除NA级别的因素

df$newcol <- factor(df$newcol)

或将其作为第三个因素水平

df$newcol <- factor(df$newcol, exclude = NULL)