ddplyr中的标准偏差函数未在熔化的数据帧上返回值

时间:2017-07-14 16:25:39

标签: r melt summarize

我有一个由3个位置值(XYZ)和3个旋转值(Omega,Phi,Kappa)组成的数据集。

head(pos.df)看起来像这样

  Batch  PhotoID         X          Y        Z       Omega         Phi      Kappa
1     1 DSC_7120 -269.6995 -359.33126 2390.522 -2.78643779  0.03288689   49.42041
2     1 DSC_7121 -323.5350 -311.80727 2388.374 -1.43015984 -0.61313717   49.08223
3     1 DSC_7122 -381.0833 -259.52629 2386.173 -0.08466679 -2.05867638   48.67501
4     1 DSC_7123 -434.4999 -212.15629 2384.075 -0.23728698 -1.97925763   49.09743
5     1 DSC_7707 -297.2458  -12.70537 2352.626 -1.17187585  0.70767493 -130.93919
6     1 DSC_7708 -238.0820  -61.07186 2353.831 -1.12715649  0.55772261 -131.25967

然后我将数据融化

dfl <- melt(pos.df, id.vars = c("Batch", "PhotoID"))

这样的 头(dfl)

Batch  PhotoID variable     value
1     1 DSC_7120        X -269.6995
2     1 DSC_7121        X -323.5350
3     1 DSC_7122        X -381.0833
4     1 DSC_7123        X -434.4999
5     1 DSC_7707        X -297.2458
6     1 DSC_7708        X -238.0820

和 尾(DFL)

Batch  PhotoID variable      value
385     5 DSC_7710    Kappa -131.57589
386     5 DSC_7711    Kappa -131.54491
387     5 DSC_7794    Kappa   51.35246
388     5 DSC_7795    Kappa   51.58456
389     5 DSC_7796    Kappa   51.82275
390     5 DSC_7797    Kappa   51.48262

现在我想看一些摘要统计数据......

smry <- ddply(dfl, c("Batch", "PhotoID", "variable"), 
              summarise, 
              mean = mean(value), 
              sd = sd(value),
              se = sd(value)/sqrt(length(value)))

但由于某种原因,SD和SE值返回NA。

头(smry)

Batch  PhotoID variable          mean sd se
1      1 DSC_7120        X -269.69945440 NA NA
2      1 DSC_7120        Y -359.33125720 NA NA
3      1 DSC_7120        Z 2390.52165300 NA NA
4      1 DSC_7120    Omega   -2.78643779 NA NA
5      1 DSC_7120      Phi    0.03288689 NA NA
6      1 DSC_7120    Kappa   49.42040741 NA NA
7      1 DSC_7121        X -323.53499700 NA NA
8      1 DSC_7121        Y -311.80726930 NA NA
9      1 DSC_7121        Z 2388.37389700 NA NA
10     1 DSC_7121    Omega   -1.43015984 NA NA

我检查过数据类型......

STR(pos.df)

'data.frame':   65 obs. of  8 variables:
 $ Batch  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ PhotoID: Factor w/ 13 levels "DSC_7120","DSC_7121",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ X      : num  -270 -324 -381 -434 -297 ...
 $ Y      : num  -359.3 -311.8 -259.5 -212.2 -12.7 ...
 $ Z      : num  2391 2388 2386 2384 2353 ...
 $ Omega  : num  -2.7864 -1.4302 -0.0847 -0.2373 -1.1719 ...
 $ Phi    : num  0.0329 -0.6131 -2.0587 -1.9793 0.7077 ...
 $ Kappa  : num  49.4 49.1 48.7 49.1 -130.9 ...

有谁可以告诉我为什么我的sd()se函数没有返回值?

作为一个例子,我为excel中的一张照片计算了这些数字,

 stat, X, Y, Z, Omega, Phi, Kappa
Variance, 0.02273259300, 0.13331103000, 0.00000342846, 0.00000214810, 0.00000364895, 0.00000310653
SD, 0.13485575300, 0.32657131600, 0.00165613000, 0.00131090800, 0.00170855500, 0.00157646000

技术上它们确实存在......

感谢您的时间。

1 个答案:

答案 0 :(得分:1)

感谢@ChiPak和@Wen

我过度约束我的总结功能......

&#39;批量&#39;需要从通话中删除...像这样

smry <- ddply(dfl, c("PhotoID", "variable"), 
              summarise, 
              mean = mean(value), 
              sd = sd(value),
              se = sd(value)/sqrt(length(value))) 

现在,

head(smry)

PhotoID variable          mean           sd           se
1  DSC_7120        X -269.69730716 0.1507733086 0.0674278735
2  DSC_7120        Y -359.60802888 0.3651178278 0.1632856566
3  DSC_7120        Z 2390.51990620 0.0018517456 0.0008281258
4  DSC_7120    Omega   -2.78508610 0.0014656399 0.0006554541
5  DSC_7120      Phi    0.03468442 0.0019102228 0.0008542776
6  DSC_7120    Kappa   49.42263779 0.0017625356 0.0007882299
7  DSC_7121        X -323.53707466 0.1508844825 0.0674775919
8  DSC_7121        Y -312.08052414 0.3633875558 0.1625118554
9  DSC_7121        Z 2388.37413460 0.0005815413 0.0002600732
10 DSC_7121    Omega   -1.42917428 0.0016912203 0.0007563367

谢谢你们俩。