Question

所以我试图从我的数据框中删除NA值，但仅在2种情况下。 1.如果一行中的所有值均为NA。 2.如果该行值中超过70％的值是NA

到目前为止，我已经尝试了drop_na和其他一些操作，但是似乎无法按照我想要的方式获得它。（仅在有条件的情况下。

到目前为止，这是我的代码

#1 Load csv files into directory
# install tidyverse if it is not installed
install.packages("tidyverse")

# load the tidyverse library
library("tidyverse")
setwd("C:/Users/ibrahim.cetinkaya/OneDrive - NTT/Desktop/data")
##################### Part A #####################
# data files (you need to specify the paths of the CSV files (e.g. relativeor absolute) )
files <- c("data/201808.csv",
       "data/201809.csv",
       "data/201810.csv",
       "data/201811.csv",
       "data/201812.csv",
       "data/201901.csv",
       "data/201902.csv",
       "data/201903.csv",
       "data/201904.csv",
       "data/201905.csv",
       "data/201906.csv",
       "data/201908.csv"
)

#Concatenate into one data frame. 
data <- data.frame()
for (i in 1:length(files)){
  temp <- read_csv(files[i], skip = 7)
  data <- rbind(data, temp)
}
#View to verify
view(data)

#Part 2
#Remove vairables which have no data at all (All the data are na's)
#Remove variables that doesn't have adequate data (70% of the number of records are NA's)

这里是数据的图片，因此您可以更好地对其进行可视化

编程使用条件删除na

0 个答案: