为什么R说有行包含缺失值?

时间:2015-10-09 19:06:52

标签: r

我在此数据集上运行以下R脚本:http://pastebin.com/HA42b8QV

require(ggplot2)
data <- read.table("funcExp.txt", sep = "\t", header = TRUE)
data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insTime <- strtoi(data$insTime)
ggplot(data, aes(n, insTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$decTime <- strtoi(data$decTime)
ggplot(data, aes(n, decTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$delTime <- strtoi(data$delTime)
ggplot(data, aes(n, delTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insComp <- strtoi(data$insComp)
ggplot(data, aes(n, insComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")


data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$decComp <- strtoi(data$decComp)
ggplot(data, aes(n, decComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$delComp <- strtoi(data$delComp)
ggplot(data, aes(n, delComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

我收到以下警告:

Loading required package: ggplot2
Loading required package: methods
Warning messages:
1: Removed 26 rows containing missing values (stat_summary). 
2: Removed 26 rows containing missing values (geom_point). 
Warning messages:
1: Removed 30 rows containing missing values (stat_summary). 
2: Removed 30 rows containing missing values (geom_point). 
Warning messages:
1: Removed 22 rows containing missing values (stat_summary). 
2: Removed 22 rows containing missing values (geom_point). 
Warning messages:
1: Removed 36 rows containing missing values (stat_summary). 
2: Removed 36 rows containing missing values (geom_point). 
Warning messages:
1: Removed 36 rows containing missing values (stat_summary). 
2: Removed 36 rows containing missing values (geom_point). 
Warning messages:
1: Removed 25 rows containing missing values (stat_summary). 
2: Removed 25 rows containing missing values (geom_point). 

我在网上搜索试图找出原因然而我无法理解。大多数帖子都表明我的数据集中存在空值。我的数据集中没有遗漏任何东西,因此我无法理解为什么R会简单地假设某些东西实际上缺失了。

谢谢

1 个答案:

答案 0 :(得分:3)

似乎在修改初始数据时,你正在弄乱它。

如果你不写

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insTime <- strtoi(data$insTime)
然后这些情节很好用。

看,数据的结构已经告诉你一切都很好:

 > str(data)
 data.frame':   60 obs. of  8 variables:
 $ alg    : Factor w/ 3 levels "aheap","fibheap",..: 1 3 2 1 3 2 1 3 2 1 ...
 $ n      : int  2 2 2 4 4 4 8 8 8 16 ...
 $ insTime: num  408 867 1332 400 1031 ...
 $ decTime: num  359 738 1079 411 856 ...
 $ delTime: num  325 750 1242 416 931 ...
 $ insComp: num  0.9 1.5 2.5 1.9 3.5 6.5 5.8 11.6 18.6 12 ...
 $ decComp: num  0.5 1.1 5.1 1.7 3.6 11.6 3 7 23 11.6 ...
 $ delComp: num  0 0 1 3.6 7.6 14.8 16.8 38 67.6 57 ...

并且您的摘要未显示任何NAs:

 > summary(data)
      alg           n              insTime             decTime            delTime             insComp       
 aheap  :20   Min.   :      2   Min.   :      400   Min.   :     359   Min.   :3.250e+02   Min.   :      1  
 fibheap:20   1st Qu.:     56   1st Qu.:     4518   1st Qu.:    3262   1st Qu.:8.420e+03   1st Qu.:     87  
 pheap  :20   Median :   1536   Median :   110041   Median :   67643   Median :2.743e+05   Median :   3095  
              Mean   : 104858   Mean   :  8304522   Mean   : 5866098   Mean   :9.325e+07   Mean   : 258807  
              3rd Qu.:  40960   3rd Qu.:  2416198   3rd Qu.: 1556492   3rd Qu.:1.132e+07   3rd Qu.:  92170  
              Max.   :1048576   Max.   :142359000   Max.   :88428500   Max.   :2.088e+09   Max.   :3735370  
    decComp           delComp         
 Min.   :      0   Min.   :        0  
 1st Qu.:     89   1st Qu.:      608  
 Median :   2790   Median :    46142  
 Mean   : 226980   Mean   :  7884811  
 3rd Qu.:  75944   3rd Qu.:  2085385  
 Max.   :3983010   Max.   :138010000  

使用strtoi之后你就创建了NAs!

> data$decTime <- strtoi(data$decTime)
> summary(data)
      alg           n              insTime             decTime            delTime             insComp       
 aheap  :20   Min.   :      2   Min.   :     2175   Min.   :     498   Min.   :3.250e+02   Min.   :      1  
 fibheap:20   1st Qu.:     56   1st Qu.:   222651   1st Qu.:  264344   1st Qu.:8.420e+03   1st Qu.:     87  
 pheap  :20   Median :   1536   Median :  1545575   Median : 1596015   Median :2.743e+05   Median :   3095  
              Mean   : 104858   Mean   : 14642987   Mean   :11713536   Mean   :9.325e+07   Mean   : 258807  
              3rd Qu.:  40960   3rd Qu.: 10317432   3rd Qu.: 9105678   3rd Qu.:1.132e+07   3rd Qu.:  92170  
              Max.   :1048576   Max.   :142359000   Max.   :88428500   Max.   :2.088e+09   Max.   :3735370  
                                NA's   :26          NA's   :30                                              
    decComp           delComp         
 Min.   :      0   Min.   :        0  
 1st Qu.:     89   1st Qu.:      608  
 Median :   2790   Median :    46142  
 Mean   : 226980   Mean   :  7884811  
 3rd Qu.:  75944   3rd Qu.:  2085385  
 Max.   :3983010   Max.   :138010000 

希望有帮助吗?