我有一个数据集如下,
[1] "21/12/16, 14:25:10: abcd
[2] "21/12/16, 14:25:14: 1234
[3] "21/12/16, 14:25:22: XXX
[4] "21/12/16, 14:25:30: YYY
[5] "21/12/16, 14:25:47: ZZZ
日期变量将上述数据集中的所有日期都包含为
> head(date)
[1] "21/12/16" "21/12/16" "21/12/16" "21/12/16" "21/12/16"
时间变量始终来自数据集,
> head(time)
[1] "14:25" "14:25" "14:25" "14:25" "14:25"
现在我希望将数据集修改为,
[1] abcd
[2] 1234
[3] XXX
[4] YYY
[5] ZZZ
我们怎么做?我试过gsub但是没有用。有人可以帮助我。
答案 0 :(得分:2)
您对预期的行为并不完全准确,但对于您提供的数据集,拆分“:”并获取结果向量的第四个元素将获得所需的结果。但是,您应该考虑用例以及是否可以依赖于该工作。例如在您想要的字符串之前是否总会有三个冒号?你想要的字符串是否永远不会包含冒号?等
另外,我认为你的行中缺少一个结束引号。
答案 1 :(得分:1)
readLines(con = textConnection("21/12/16, 14:25:10: abcd
21/12/16, 14:25:14: 1234
21/12/16, 14:25:22: XXX
21/12/16, 14:25:30: YYY
21/12/16, 14:25:47: ZZZ")) -> text_file_lines
text_file_lines
## [1] "21/12/16, 14:25:10: abcd" "21/12/16, 14:25:14: 1234"
## [3] "21/12/16, 14:25:22: XXX" "21/12/16, 14:25:30: YYY"
## [5] "21/12/16, 14:25:47: ZZZ"
# built-in
# somewhat forgiving regex replace
sub("^[[:digit:]]+/[[:digit:]]+/[[:digit:]]+,[[:space:]]+[[:digit:]]+:[[:digit:]]+:[[:digit:]]+:[[:space:]]", "", text_file_lines)
## [1] "abcd" "1234" "XXX" "YYY" "ZZZ"
# external pkg
# this matches from last : onward and extracts the bits you want
stringi::stri_match_last_regex(text_file_lines, ": ([[:print:]]+)$")[,2]
## [1] "abcd" "1234" "XXX" "YYY" "ZZZ"