Question

我想为数据帧中的几个不同列创建连续行之间特定值之间的转换的指示符。

一些示例数据：

structure(list(Year = 1998:2007, Pregnant = structure(c(2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    Infection = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 
    1L), .Label = c("Negative", "Positive"), class = "factor"), 
    Keep = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("Year", 
"Pregnant", "Infection", "Keep"), class = "data.frame", row.names = c(NA, 
-10L))

#    Year Pregnant Infection Keep
# 1  1998      Yes  Positive    0
# 2  1999      Yes  Positive    0
# 3  2000       No  Negative    0
# 4  2001       No  Negative    1 # Infection changes from Negative to Positive 
# 5  2002       No  Positive    1
# 6  2003       No  Positive    0
# 7  2004       No  Negative    0
# 8  2005       No  Negative    1 # Pregnant changes from No to Yes
# 9  2006      Yes  Negative    1
# 10 2007      Yes  Negative    0

我想标记按特定顺序更改的行。例如，怀孕列的值从“否”（第8行）更改为“是”（第9行），感染列的值从“负数”（第4行）更改为“正数”（第5行）。因此，我想标记这些行（“保留”列将标记的行指示为1）。

列中还发生了其他更改，例如“怀孕-是”为“否”以及“感染为阳性”为“阴性”，但是这些更改并不重要。我只想按特定顺序指示值的顺序。

Variable - Pregnant, From - 'No', To - 'Yes' 
Variable - Infection, From - 'Negative', To - 'Positive'

我有20多个列，我想在其中检测每列中的某些变化，并创建相应的指标变量。

Answer 1

首先将所有因子级别显式设置为所需的从头到尾的顺序（而不是“希望”它们与字母排序一致；））

通过创建一个有序的因子，您可以将连续的行与数据的超前和滞后版本中的<进行比较。因此，我们可以一次性计算所有转换（而不是对每个变量进行硬编码-当列数很大时很繁琐）。

# select relevant columns from original data
d <- df[ , 2:3]
# or, assuming that 'Keep' is not in original data, just remove the first column 'Year'
# d <- df[ , -1]

# set factor levels in order of from-to
d$Pregnant <- factor(d$Pregnant, levels = c("No", "Yes"), ordered = TRUE)
d$Infection <- factor(d$Infection, levels = c("Negative", "Positive"), ordered = TRUE)

# check if factor levels are 'increasing' between rows
m <- d[-nrow(d), ] < d[-1, ]

# add a FALSE row to restore dimensions
m <- rbind(rep(FALSE, ncol(m)), m)

# get indices of changes
ix <- which(m, arr.ind = TRUE)

# set also preceeding rows to TRUE
m[cbind(ix[ , 1] - 1, ix[ , 2])] <- TRUE

基本上就是这样。您可以更改名称并将其强制转换为数字：

dimnames(m) <- list(NULL, paste0(colnames(m), "_diff"))
m <- m + 0

最后，根据“转换变量”中任何1的存在情况创建一个“ keep”列，并将cbind保留到原始数据框中：

cbind(df, Keep2 = as.integer(rowSums(m) != 0), m) 

#     Year Pregnant Infection Keep Keep2 Pregnant_diff Infection_diff
# 1  1998      Yes  Positive    0     0             0              0
# 2  1999      Yes  Positive    0     0             0              0
# 3  2000       No  Negative    0     0             0              0
# 4  2001       No  Negative    1     1             0              1
# 5  2002       No  Positive    1     1             0              1
# 6  2003       No  Positive    0     0             0              0
# 7  2004       No  Negative    0     0             0              0
# 8  2005       No  Negative    1     1             1              0
# 9  2006      Yes  Negative    1     1             1              0
# 10 2007      Yes  Negative    0     0             0              0

Answer 2

这样的事情怎么样？

df %>%
    mutate(
        grp.Preg = c(diff(as.numeric(Pregnant)) > 0, 0),
        grp.Infc = c(diff(as.numeric(Infection)) > 0, 0),
        flagChangePreg = abs(grp.Preg - lag(grp.Preg, default = 0)),
        flagChangeInfc = abs(grp.Infc - lag(grp.Infc, default = 0))) %>%
    select(-grp.Preg, -grp.Infc)
#   Year Pregnant Infection Keep flagChangePreg flagChangeInfc
#1  1998      Yes  Positive    0              0              0
#2  1999      Yes  Positive    0              0              0
#3  2000       No  Negative    0              0              0
#4  2001       No  Negative    1              0              1
#5  2002       No  Positive    1              0              1
#6  2003       No  Positive    0              0              0
#7  2004       No  Negative    0              0              0
#8  2005       No  Negative    1              1              0
#9  2006      Yes  Negative    1              1              0
#10 2007      Yes  Negative    0              0              0

列flagChangePreg和flagChangeInfc中的条目标记行，其中Pregnant从"No"变为"Yes"，而Infection从{{1} }到"Negative"。

创建几列中各行之间特定变化的指标

2 个答案: