当使用read.csv()将数据帧读入R并使用tapply()计算处理方法时,结果是一系列因子。
cfpr<-read.csv("E:/temp/vars.csv",sep=";")
cfpr
ano mes usd_brl x y
1 2014 5 2.221 181.83 403.8444
2 2014 6 2.236 172.37 385.4193
3 2014 7 2.225 169.27 376.6257
4 2014 8 2.268 175.89 398.9185
5 2015 5 3.064 144.79 443.6366
6 2015 6 3.111 151.12 470.1343
7 2015 7 3.224 135.75 437.6580
8 2015 8 3.515 135.27 475.4740
9 2016 5 3.549 135.26 480.0377
10 2016 6 3.418 145.22 496.3620
11 2016 7 3.278 155.80 510.7124
12 2016 8 3.208 156.61 502.4049
class(cfpr$ano)
[1] "integer"
class(cfpr$y)
[1] "numeric"
tapply(cfpr$y,cfpr$ano,fun=mean)
[1] 1 1 1 1 2 2 2 2 3 3 3 3
如果重命名数据框列,则tapply()再次起作用:
> cfpr<-read.csv("E:/temp/vars.csv",sep=";")
> cfpr
ano mes usd_brl x y
1 2014 5 2.221 181.83 403.8444
2 2014 6 2.236 172.37 385.4193
3 2014 7 2.225 169.27 376.6257
4 2014 8 2.268 175.89 398.9185
5 2015 5 3.064 144.79 443.6366
6 2015 6 3.111 151.12 470.1343
7 2015 7 3.224 135.75 437.6580
8 2015 8 3.515 135.27 475.4740
9 2016 5 3.549 135.26 480.0377
10 2016 6 3.418 145.22 496.3620
11 2016 7 3.278 155.80 510.7124
12 2016 8 3.208 156.61 502.4049
> colnames(cfpr)[4:5]<-c("X","Y")
> class(cfpr$ano)
[1] "integer"
> class(cfpr$Y)
[1] "numeric"
> tapply(cfpr$Y,cfpr$ano,mean)
2014 2015 2016
391.2020 456.7257 497.3792
如何避免此错误并每次重命名数据?
在我使用它时链接到数据:https://drive.google.com/open?id=1PhiuQIptVNylPFohDpl5AIDr94xxlf5p
其他信息:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252
[4] LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2
答案 0 :(得分:1)
这里的问题不是列名,而是参数,你传入函数tapply。
以下片段解释了您的疑问。
> cfpr = read.csv("vars.csv",sep = ';')
> head(cfpr)
ano mes usd_brl x y
1 2014 5 2.221 181.83 403.8444
2 2014 6 2.236 172.37 385.4193
3 2014 7 2.225 169.27 376.6257
4 2014 8 2.268 175.89 398.9185
5 2015 5 3.064 144.79 443.6366
6 2015 6 3.111 151.12 470.1343
> class(cfpr$ano)
[1] "integer"
> class(cfpr$y)
[1] "numeric"
> ## Method 1
> tapply(cfpr$y, cfpr$ano, mean)
2014 2015 2016
391.2020 456.7257 497.3792
> ## Method 2
> tapply(cfpr$y, cfpr$ano, FUN = function(x){mean(x)})
2014 2015 2016
391.2020 456.7257 497.3792
请注意,如果您需要传递要应用的函数,则需要提供参数FUN
。有关更多参考,请通过键入?tapply
来检查tapply的文档。
希望这可以解除你的怀疑。