我有一个大型数据框。较小的子集如下:
structure(list(Date = c("2017-08-12", "2017-08-12", "2017-08-12"
), `Time (sec)` = c("19:01:04", "07:30:18", "04:29:38"), `4+DURATION` = c("26",
"58,000", "27"), `4+'000 (AVG)` = c("0.0000", "0.0000", "0.0000"),
`15+DURATION` = c("26", "57,000", "27"), `15+'000 (AVG)` = c("0.0000",
"0.0000", "0.0000")), .Names = c("Date", "Time (sec)", "4+DURATION",
"4+'000 (AVG)", "15+DURATION", "15+'000 (AVG)"), row.names = 3:5, class = "data.frame")
实际数据框如下所示:
Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG)
3 2017-08-12 19:01:04 26 0.0000 26 0.0000
4 2017-08-12 07:30:18 58,000 0.0000 57,000 0.0000
5 2017-08-12 04:29:38 27 0.0000 27 0.0000
从第3列起,其余列存储为字符向量。我试图将字符转换为数字。以下是我使用的代码。
cols.num <- names(dat[,-c(1:2)])
dat[cols.num] <- sapply(dat[cols.num],as.numeric)
dat是我的数据框。这会在字符值中包含额外逗号的持续时间列中强制NA值。
我试图通过
删除它df[,unique(grep("DUR", names(df), value=T))] <- gsub(",","",df[,unique(grep("DUR", names(df), value=T))])
但这会产生像这样的df
Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG)
3 2017-08-12 19:01:04 c("26" "58000" "27") 0.0000 c("26" "57000" "27") 0.0000
4 2017-08-12 07:30:18 c("26" "57000" "27") 0.0000 c("26" "58000" "27") 0.0000
5 2017-08-12 04:29:38 c("26" "58000" "27") 0.0000 c("26" "57000" "27") 0.0000
但是所需的输出是:
Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG)
3 2017-08-12 19:01:04 26 0.0000 26 0.0000
4 2017-08-12 07:30:18 58000 0.0000 57000 0.0000
5 2017-08-12 04:29:38 27 0.0000 27 0.0000
这个数据框中的问题是,我不知道哪个列将具有持续时间值,并且具有持续时间值的列名称不断变化,从4 + DURATION到45 + DURATION等。我想删除逗号在将矢量转换为数字之前,从名称中包含DURATION的所有矢量开始。
答案 0 :(得分:2)
您需要将*apply
列入感兴趣的列,因为gsub
(仅供参考,sub
此处也会很好) NOT 矢量化,即
df[,unique(grep("DUR", names(df), value=T))] <-
lapply(df[,unique(grep("DUR", names(df), value=T))], function(i)
as.numeric(sub(',', '', i)))
给出,
Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG) 3 2017-08-12 19:01:04 26 0.0000 26 0.0000 4 2017-08-12 07:30:18 58000 0.0000 57000 0.0000 5 2017-08-12 04:29:38 27 0.0000 27 0.0000
#str(df)
#'data.frame': 3 obs. of 6 variables:
# $ Date : chr "2017-08-12" "2017-08-12" "2017-08-12"
# $ Time (sec) : chr "19:01:04" "07:30:18" "04:29:38"
# $ 4+DURATION : num 26 58000 27
# $ 4+'000 (AVG) : chr "0.0000" "0.0000" "0.0000"
# $ 15+DURATION : num 26 57000 27
# $ 15+'000 (AVG): chr "0.0000" "0.0000" "0.0000"
答案 1 :(得分:1)
dplyr
解决方案:
d <- structure(list(Date = c("2017-08-12", "2017-08-12", "2017-08-12"
), `Time (sec)` = c("19:01:04", "07:30:18", "04:29:38"), `4+DURATION` = c("26",
"58,000", "27"), `4+'000 (AVG)` = c("0.0000", "0.0000", "0.0000"),
`15+DURATION` = c("26", "57,000", "27"), `15+'000 (AVG)` = c("0.0000",
"0.0000", "0.0000")), .Names = c("Date", "Time (sec)", "4+DURATION",
"4+'000 (AVG)", "15+DURATION", "15+'000 (AVG)"), row.names = 3:5, class = "data.frame")
d2 <- d %>% mutate_at(vars(contains('DURATION')), funs(as.numeric(gsub(',', '', .))))
str(d2)