识别具有统一顺序的行,而忽略R中丢失的数据

时间:2018-11-08 10:59:50

标签: r count sequence

我正在处理面板数据,其中多次记录同一变量以创建状态序列。我只想使用没有统一序列的观察结果,但是我努力创建一个可以识别这些结果的标记,同时也没有将NA视为不同的状态。

我创建了一个示例数据集来使事情变得简单:

ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5)
df

   ID         S1         S2         S3         S4         S5
1   1  Education  Education  Education  Education  Education
2   2 Employment Employment Employment Employment Employment
3   3  Education  Education         NA  Education  Education
4   4  Education Unemployed Unemployed Unemployed Unemployed
5   5  Education  Education  Education  Education  Education
6   6  Education  Education Employment Employment Employment
7   7  Education Employment Employment Employment Employment
8   8  Education  Education         NA         NA         NA
9   9  Education  Education  Education  Education  Education
10 10  Education  Education  Education  Education  Education

理想情况下,我将只能标记或保留观察值ID = c(“ 4”,“ 6”,“ 7”)。

我尝试了几种方法:

我尝试对连续状态进行计数,但这并不能说明单独的ID

library(data.table)

setDT(df_long)
df_long[, employed := (S=="Employment")
   ][, e.length := with(rle(employed), rep(lengths,lengths))
     ][employed == 0, e.length := 0]

df_long[, education := (S=="Education")
        ][, edu.length := with(rle(education), rep(lengths,lengths))
          ][education == 0, edu.length := 0]
df_long

我也尝试过手动创建一个标志变量,但这并不能解决NA的问题,而且由于我的数据集中重复观察的次数太繁琐/耗时

df$employed[df$S1=="Education" & df$S2=="Education" & df$S3=="Education" & df$S4=="Education" & df$S5=="Education"] <- 1
df$employed

任何帮助将不胜感激。

3 个答案:

答案 0 :(得分:0)

它超级简单:

df[df == "NA"] <- NA

df$keep <- lengths(apply(df[,-1],1, table)) > 1

#> which(df$keep)
#[1] 4 6 7

答案 1 :(得分:0)

我有一个类似的解决方案,但是没有table

df[df == "NA"] <- NA
df$to.keep <- apply(df[, -1], 1, function(x) {
  !any(is.na(x)) & length(unique(x)) > 1
})

> which(df$to.keep)
[1] 4 6 7

答案 2 :(得分:0)

ID <- c(1,2,3,4,5,6,7,8,9,10)
S1 <- c("Education", "Employment", "Education", "Education", "Education", "Education", "Education", "Education", "Education", "Education")
S2 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Education", "Employment", "Education", "Education", "Education")
S3 <- c("Education", "Employment", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S4 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S5 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "NA", "Education", "Education")
S6 <- c("Education", "Employment", "Education", "Unemployed", "Education", "Employment", "Employment", "EMP", "Education", "Education")
df <- data.frame(ID, S1, S2, S3, S4, S5,S6)

还在您的评论中添加了S6,其中安德烈(Andre)回答无法正确标记它

library(dplyr)
df[df == "NA"] <- NA

df$Flag_NA = ifelse(apply(df %>% select(-ID),1,function(x) any(is.na(x))),'No','Yes')
df$Flag_Uniform = ifelse(apply(df %>% select(-ID,-Flag_NA), 1, function(x)length(unique(x))) == 1,'No','Yes')
df = df %>% mutate(Flag_keep = ifelse(Flag_NA == Flag_Uniform,"Yes","No"))

df
   ID         S1         S2         S3         S4         S5         S6 Flag_NA Flag_Uniform Flag_keep
1   1  Education  Education  Education  Education  Education  Education     Yes           No        No
2   2 Employment Employment Employment Employment Employment Employment     Yes           No        No
3   3  Education  Education       <NA>  Education  Education  Education      No          Yes        No
4   4  Education Unemployed Unemployed Unemployed Unemployed Unemployed     Yes          Yes       Yes
5   5  Education  Education  Education  Education  Education  Education     Yes           No        No
6   6  Education  Education Employment Employment Employment Employment     Yes          Yes       Yes
7   7  Education Employment Employment Employment Employment Employment     Yes          Yes       Yes
8   8  Education  Education       <NA>       <NA>       <NA>        EMP      No          Yes        No
9   9  Education  Education  Education  Education  Education  Education     Yes           No        No
10 10  Education  Education  Education  Education  Education  Education     Yes           No        No