Rextester

Question

I want to split a dataset in R based on NA values from a variable, for example:

and make it like this:

  var1 var2
   1    21
   4    10  




 var1 var2
   2    NA
   3    NA

Answer 1

See More Details:

Most statistical functions (e.g., lm()) have something like na.action which applies to the model, not to individual variables. na.fail() returns the object (the dataset) if there are no NA values, otherwise it returns NA (stopping the analysis). na.pass() returns the data object whether or not it has NA values, which is useful if the function deals with NA values internally. na.omit () returns the object with entire observations (rows) omitted if any of the variables used in the model are NA for that observation. na.exclude() is the same as na.omit(), except that it allows functions using naresid or napredict. You can think of na.action as a function on your data object, the result being the data object in the lm() function. The syntax of the lm() function allows specification of the na.action as a parameter:

lm(na.omit(dataset),y~a+b+c)
lm(dataset,y~a+b+c,na.omit) # same as above, and the more common usage

You can set your default handling of missing values with

options("na.actions"=na.omit)

Answer 2

Hi try this

new_DF <- DF[rowSums(is.na(DF)) > 0,]

or in case you want to check a particular column, you can also use

new_DF <- DF[is.na(DF$Var),]

In case you have NA character values, first run

Df[Df=='NA'] <- NA

to replace them with missing values.

Answer 3

You could just subset the data frame using is.na():

df1 <- df[!is.na(df$var2), ]
df2 <- df[is.na(df$var2), ]

Demo here:

Rextester

Answer 4

split function comes handily in this case.

data <- read.table(text="var1 var2
   1    21
   2    NA
   3    NA
   4    10", header=TRUE)

split(data, is.na(data$var2))
# 
# $`FALSE`
# var1 var2
# 1    1   21
# 4    4   10
# 
# $`TRUE`
# var1 var2
# 2    2   NA
# 3    3   NA

Answer 5

An alternative and more general approach is using the complete.cases command. The command spots rows that have no missing values (no NAs) and returns TRUE/FALSE values.

dt = data.frame(var1 = c(1,2,3,4),
                var2 = c(21,NA,NA,10))

dt1 = dt[complete.cases(dt),]
dt2 = dt[!complete.cases(dt),]

dt1

#   var1 var2
# 1    1   21
# 4    4   10

dt2

#   var1 var2
# 2    2   NA
# 3    3   NA

R splitting a data frame based on NA values

5 个答案:

Rextester