在我的数据集中有一个名为duration的列。从中我想将小时和分钟分为2个单独的列。如果没有小时或分钟,则要相应地添加0h或0m。
在下面的图像中提供了相同的现有列详细信息以及预期的新列:
train <- read.csv("sampledata.csv", stringsAsFactors = F)
train$Duration
编辑:
sampledata <- data.frame(
emp_id = c (1:5),
Duration = c("10h 50m","5h 34m","9h","4h 15m","23m"),
stringsAsFactors = FALSE
)
sampledata$Duration
答案 0 :(得分:1)
使用sub()和gsub的解决方案如下
# first identify strings with "h"
h_in_str <- grepl("h", sampledata$Duration)
# if string has "h", then return all before "h" or else return 0
sampledata$Hours <- ifelse(h_in_str, sub("h.*", "", sampledata$Duration), 0)
# identify strings with "m"
m_in_str <- grepl("m", sampledata$Duration)
# if string has "m", return all numbers without those preceding "h" or else return 0
sampledata$Minutes <- ifelse(m_in_str,
gsub("([0-9]+).*$", "\\1", sub(".*h", "", sampledata$Duration)), 0)
这将为您提供所需的数据
sampledata
emp_id Duration Hours Minutes
1 1 10h 50m 10 50
2 2 5h 34m 5 34
3 3 9h 9 0
4 4 4h 15m 4 15
5 5 23m 0 23
答案 1 :(得分:0)
我不会说最好的答案,但是一种方法是
#Get numbers next to hours and minutes
hour_minute <- sub("(\\d+)h (\\d+)m", "\\1-\\2", sampledata$Duration)
sampledata[c("hour", "minutes")] <- t(sapply(strsplit(hour_minute, "-"),
function(x) {
if (length(x) == 2) x
else if (endsWith(x, "h")) c(sub("h", "", x), 0)
else c(0, sub("m", "", x))
}))
sampledata
emp_id Duration hour minutes
1 1 10h 50m 10 50
2 2 5h 34m 5 34
3 3 9h 9 0
4 4 4h 15m 4 15
5 5 23m 0 23