read.csv中的na.string和dplyr :: mutate

时间:2016-07-19 07:09:55

标签: r csv dplyr data-manipulation data-cleaning

我使用了两种方法将数据(CSV格式)导入到R中。第一种方法没有na.string参数,而第二种方法没有。我使用了第二个,因为导入后某些字符串显示为""而不是NA,并且我希望将所有缺失值标准化为NA

data1<-read.csv("file.csv",stringsAsFactors=FALSE)
data2<-read.csv("file",stringsAsFactors=FALSE,na.string="")

我有3个变量作为指标。他们提供"X"表示赞成,"" / NA表示否。我尝试将以下功能应用于上面的data1data2

df1<-data1%>%
     mutate(Indicator_Institution=ifelse(Indicator_A=="X",1,
                                  ifelse(Indicator_B=="X",2,
                                  ifelse(Indicator_C=="X",3,NA))))
df2<-data2%>%
     mutate(Indicator_Institution=ifelse(Indicator_A=="X",1,
                                  ifelse(Indicator_B=="X",2,
                                  ifelse(Indicator_C=="X",3,NA))))

df1&#39; ifelse函数经历了所有条件,而df2只运行第一个条件。知道为什么吗?论证na.string=""有什么不同?

可重复的例子:

    > dput(droplevels(head(data1)))
structure(list(Indicator_A = c("X", "X", "X", "X", "", ""), 
    Indicator_B = c("", "", "", "", "X", "X"), Indicator_C = c("", 
    "", "", "", "", "")), .Names = c("Indicator_A", "Indicator_B", 
"Indicator_C"), row.names = c(NA, 6L), class = "data.frame")

> dput(droplevels(head(data2)))
structure(list(Indicator_A = c("X", "X", "X", "X", NA, NA), 
    Indicator_B = c(NA, NA, NA, NA, "X", "X"), Indicator_C = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_)), .Names = c("Indicator_A", "Indicator_B", 
"Indicator_C"), row.names = c(NA, 6L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

原因是对于第二种情况而不是空白,我们有NA。如果我们使用==,NA值将保持不变。要将这些值设为FALSE,请使用&!is.na

data2 %>% 
    mutate(Indicator_Institution = ifelse(Indicator_A == "X" & !is.na(Indicator_A), 1, 
                                   ifelse(Indicator_B=="X" & !is.na(Indicator_B), 2,
                                   ifelse(Indicator_C == "X" & !is.na(Indicator_C), 3, 
                  NA))))

根据提供的示例,可以使用which

轻松完成此操作
which(!is.na(data2), arr.ind=TRUE)[,2]