我想在我的条形图中找到一个缺少数据的列。
adult <- read.csv(
"http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
header = FALSE,
na.strings = "?",
strip.white = TRUE
)
colnames(adult) <- c("age", "workClass", "fnlwgt", "education", "educationNum", "maritalStatus", "occupation", "relationship", "race", "sex", "capitalGain", "capitalLoss", "hoursPerWeek", "nativeCountry", "prediction")
barplot(table(adult$workClass), main="Job Distribution", xlab="Job", ylab="Count",las=2)
我知道在这个数据集中,workClass
有1836个缺失值,来自
length(which(is.na(adult$workClass)))
答案 0 :(得分:3)
您可以在useNA = "ifany"
中使用参数table
。
tab <- table(adult$workClass, useNA = "ifany")
# Federal-gov Local-gov Never-worked Private
# 960 2093 7 22696
# Self-emp-inc Self-emp-not-inc State-gov Without-pay
# 1116 2541 1298 14
# <NA>
# 1836
默认情况下,NA
计数的名称为NA
。您可以使用以下命令将名称更改为字符串"NA"
。
names(tab)[is.na(names(tab))] <- "NA"
现在,图表也在 x 轴上显示名称"NA"
。
barplot(tab, main = "Job Distribution", xlab = "Job", ylab = "Count", las = 2)