我正在研究R.我下载了一个数据集并使用以下命令导入R:
data <- read.csv( file = "observations.csv", quote ="\"")
一些变量是:id,responsetime,errormsg,...
在全局环境部分,似乎id是int,errormsg是4级的因子,响应时间是7040级别的因子。也许问题在这里,因为响应时间不是一个分类变量,为什么响应时间被认为是一个因素?
但我不理解一件事。例如,我正在计算响应时间的平均值,如下所示:
mean(data$responsetime)
但我有这个错误:
平均(数据$ RESPONSETIME)
[1] NA
Warning message:
In mean.default(data$responsetime) :
argument is not numeric or logical: returning NA
你知道为什么会这样吗?响应时间是一个数值变量,因此应该可以计算平均值。
str(数据)显示:
data.frame': 43000 obs. of 6 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ callNum : int 1 7 7 80 20 22 30 34 30 42 ...
$ serverCode : Factor w/ 10 levels "A","B","E",..: 1 1 1 1 1 1 1 1 1 1 ...
$ processtime : Factor w/ 240 levels "0","1","10","100",..: 111 1 1 1 1 1 1 1 1 1 ...
$ responsetime: Factor w/ 10304 levels "100","1000","10001",..: 1621 4126 8982 2743 7425 2796 3486 6212 2048 2512 ...
$ errormsg : Factor w/ 4 levels "Failure",..: 2 2 2 2 2 2 2 2 2 2 ...
csv文件看起来像这样(从sql导出后):
"id","callNum","serverCode","processtime","responsetime","errormsg"
"1","1","XNN","2","1204","OK"
"2","1","YNN","1","4236","OK"