Question

设置

我有一个数据集，包括3.5e6 1＆s，7.5e6 0＆s，以及4.4e6 NA＆＃39; s。当我打电话给summary()时，我会得到一个错误的均值和最大值（与mean()和max()不一致。）

> summary(data, digits = 10)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
 0       0       1       1       1       1 4365239

单独调用mean()时，它会返回一个合理的值：

> mean(data, na.rm = T)
[1] 0.6804823

问题的表征

看起来这个问题对于任何超过3162277 NA值的矢量都是通用的。

刚刚截止：

> thingie <- as.numeric(c(rep(0,1e6), rep(1,1e6), rep(NA,3162277)))
> summary(thingie)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    0.0     0.0     0.5     0.5     1.0     1.0 3162277

刚刚结束：

> thingie <- as.numeric(c(rep(0,1e6), rep(1,1e6), rep(NA,3162278)))
> summary(thingie)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0       0       0       0       1       1 3162278

似乎无论有多少非缺失值也无关紧要。

> thingie <- as.numeric(c(rep(0,1), rep(1,1), rep(NA,3162277)))
> summary(thingie)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    0.0     0.2     0.5     0.5     0.8     1.0 3162277 
> thingie <- as.numeric(c(rep(0,1), rep(1,1), rep(NA,3162278)))
> summary(thingie)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0       0       0       0       1       1 3162278

研究

在搜索答案时，我遇到了众所周知的舍入错误，但这不会影响此行为（请参阅第一个代码块）。
我认为这可能是我的环境/机器/行星对齐的某种奇怪的怪癖，所以我让我的妹妹运行相同的代码。她在她的机器上得到了相同的结果。

结束语

显然，这不是一个至关重要的问题，因为可以使用mean()和max()函数代替summary()，但如果有人知道是什么原因，我很好奇这种行为。此外，我的妹妹和我都没有发现任何提及它，所以我想我会为后代记录它。

Answer 1

以下是一些示例数据：

-(UITableViewCell*)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath {
    AccountTableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:@"AccountCellID"];
    if (cell==nil) {
        cell = [[AccountTableViewCell alloc] initWithStyle:UITableViewCellStyleDefault reuseIdentifier:@"AccountCellID"];
    }

// here update content
    cell.customImg = [UIImage imageNamed:[NSString stringWithFormat:@"%ld.png",indexPath.row]]


    return cell;
}

这个问题可以追溯到x <- rep(c(1,0,NA), c(3.5e6,7.5e6,4.4e6)) out <- summary(x) out # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's # 0 0 0 0 1 1 4400000 mean(x, na.rm=TRUE) #[1] 0.3181818，因为它在一行基本上做了一些舍入：

zapsmall()

这里的关键转折点是c(out) # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's # 0.000e+00 0.000e+00 0.000e+00 3.182e-01 1.000e+00 1.000e+00 4.400e+06 round(c(out), max(0L, getOption("digits")-log10(4400000))) # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's # 0 0 0 0 1 1 4400000到3162277 3162278值，它会在0.5到1之间提示舍入阈值从0到1.

NA

R summary（）为过多的NA提供了不正确的值

设置

问题的表征

研究

结束语

1 个答案: