从数据集中删除字符

时间:2017-09-27 19:49:05

标签: r

我有一个数据集如下,

[1] "21/12/16, 14:25:10: abcd                     
[2] "21/12/16, 14:25:14: 1234            
[3] "21/12/16, 14:25:22: XXX           
[4] "21/12/16, 14:25:30: YYY          
[5] "21/12/16, 14:25:47: ZZZ

日期变量将上述数据集中的所有日期都包含为

> head(date) [1] "21/12/16" "21/12/16" "21/12/16" "21/12/16" "21/12/16"

时间变量始终来自数据集,

> head(time) [1] "14:25" "14:25" "14:25" "14:25" "14:25"

现在我希望将数据集修改为,

[1] abcd                     
[2] 1234            
[3] XXX           
[4] YYY          
[5] ZZZ

我们怎么做?我试过gsub但是没有用。有人可以帮助我。

2 个答案:

答案 0 :(得分:2)

您对预期的行为并不完全准确,但对于您提供的数据集,拆分“:”并获取结果向量的第四个元素将获得所需的结果。但是,您应该考虑用例以及是否可以依赖于该工作。例如在您想要的字符串之前是否总会有三个冒号?你想要的字符串是否永远不会包含冒号?等

另外,我认为你的行中缺少一个结束引号。

答案 1 :(得分:1)

readLines(con = textConnection("21/12/16, 14:25:10: abcd
21/12/16, 14:25:14: 1234
21/12/16, 14:25:22: XXX
21/12/16, 14:25:30: YYY
21/12/16, 14:25:47: ZZZ")) -> text_file_lines

text_file_lines
## [1] "21/12/16, 14:25:10: abcd" "21/12/16, 14:25:14: 1234"
## [3] "21/12/16, 14:25:22: XXX"  "21/12/16, 14:25:30: YYY" 
## [5] "21/12/16, 14:25:47: ZZZ" 

# built-in
# somewhat forgiving regex replace
sub("^[[:digit:]]+/[[:digit:]]+/[[:digit:]]+,[[:space:]]+[[:digit:]]+:[[:digit:]]+:[[:digit:]]+:[[:space:]]", "", text_file_lines)
## [1] "abcd" "1234" "XXX"  "YYY"  "ZZZ" 

# external pkg
# this matches from last : onward and extracts the bits you want
stringi::stri_match_last_regex(text_file_lines, ": ([[:print:]]+)$")[,2]
## [1] "abcd" "1234" "XXX"  "YYY"  "ZZZ"