我有一个CSV文件,其中包含铁人三项每个部分的每个参赛者的时间。我无法读取数据,因此R可以使用它。以下是数据外观的示例(为清晰起见,我删除了一些列):
"Place","Division","Gender","Swim","T1","Bike","T2","Run","Finish"
1, "40-49","M","7:45","0:55","27:07","0:29","18:53","55:07"
2, "UNDER 18","M","5:41","0:28","30:41","0:28","18:38","55:55"
3, "40-49","M","6:27","0:26","29:24","0:40","20:16","57:11"
4, "40-49","M","7:57","0:35","29:19","0:23","19:20","57:32"
5, "40-49","M","6:28","0:32","31:00","0:34","19:19","57:51"
6, "40-49","M","7:42","0:30","30:02","0:37","19:11","58:02"
....
250 ,"18-29","F","13:20","3:23","1:06:40","1:19","38:00","2:02:40"
251 ,"30-39","F","13:01","2:42","1:02:12","1:20","43:45","2:02:58"
252 ,50 ,"F","20:45","1:33","58:09","3:17","40:14","2:03:56"
253 ,"30-39","M","13:14","1:14","DNF","1:11","25:10","DNF bike"
254 ,"40-49","M","10:04","1:41","56:36","2:32",,"D.N.F"
我第一次尝试绘制数据就是这样的。
> tri <- read.csv(file.choose(), header=TRUE, as.is=TRUE)
> pairs(~ Bike + Run + Swim, data=tri)
时间不是以合理的方式导入,因此图表没有意义。
我找到了difftime
类型,并试图用它来解析数据文件中的时间。
有些行使用DNF或类似的代替次数,我很高兴有时间无法解析的行被丢弃。时间“%M:%S”和“%H:%M:%S”有两种格式
我想我需要从数据中创建一个新的数据框,但是我在解析时间方面遇到了麻烦。这是我到目前为止所做的。
> tri <- read.csv(file.choose(), header=TRUE, as.is=TRUE)
> str(tri)
'data.frame': 254 obs. of 12 variables:
$ Place : num 1 2 3 4 5 6 7 8 9 10 ...
$ Race.. : num 237 274 268 226 267 247 264 257 273 272 ...
$ First.Name: chr ** removed names ** ...
$ Last.Name : chr ** removed names ** ...
$ Division : chr "40-49" "UNDER 18" "40-49" "40-49" ...
$ Gender : chr "M" "M" "M" "M" ...
$ Swim : chr "7:45" "5:41" "6:27" "7:57" ...
$ T1 : chr "0:55" "0:28" "0:26" "0:35" ...
$ Bike : chr "27:07" "30:41" "29:24" "29:19" ...
$ T2 : chr "0:29" "0:28" "0:40" "0:23" ...
$ Run : chr "18:53" "18:38" "20:16" "19:20" ...
$ Finish : chr "55:07" "55:55" "57:11" "57:32" ...
> as.numeric(as.difftime(tri$Bike, format="%M:%S"), units="secs")
这会转换一小时以下的所有时间,但小时会被解释为一小时内任何时间的分钟数。用“%H:%M:%S”代替“%M:%S”在一小时内解析时间,否则产生NA。转换两种类型的时间的最佳方法是什么?
编辑:根据要求添加一个简单示例。
> times <- c("27:07", "1:02:12", "DNF")
> as.numeric(as.difftime(times, format="%M:%S"), units="secs")
[1] 1627 62 NA
> as.numeric(as.difftime(times, format="%H:%M:%S"), units="secs")
[1] NA 3732 NA
我想要的输出是1627 3732 NA
答案 0 :(得分:4)
这是一个解决方案的快速破解,虽然可能有更好的解决方案:
cdifftime <- function(x) {
x2 <- gsub("^([0-9]+:[0-9]+)$","00:\\1",x) ## prepend 00: to %M:%S elements
res <- as.difftime(x2,format="%H:%M:%S")
units(res) <- "secs"
as.numeric(res)
}
times <- c("27:07", "1:02:12", "DNF")
cdifftime(times)
## [1] 1627 3732 NA
您可以将其应用于相关列:
tri[4:9] <- lapply(tri[4:9],cdifftime)
试图复制你的例子时的一些注意事项:
na.strings="DNF"
自动将“未完成”值设置为NA
options(stringsAsFactors="FALSE")
; (2)在致电stringsAsFactors=FALSE
时使用read.csv
; (3)使用as.is=TRUE
,同上。