我正在尝试拆分包含日期时间指示器前面消息的字符向量。
我在考虑将strsplit()
与正则表达式perl = TRUE
以下是一些示例数据:
TEST <- c("05.10.17, 09:26 - Person One: How about we chill on sunday\n05.10.17, 09:27 - Person One: I could bring some beer\n05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n05.10.17, 09:27 - Person Two: ???\n05.10.17, 09:28 - Person Two: You guys have history?\n05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n")
这是我到目前为止所尝试的:
Cut <- unlist(strsplit(TEST,"(?=[0-3][0-9][.][0-9]{2}[.][0-9]{2}[,][ ][0-9]{2}:[0-9]{2})", perl = TRUE))
Cut
根据this website,正则表达式应该在日期时间指示符前面剪切字符串。但是,我得到的结果看起来像这样,第一个字符被切断了:
[1] "0"
[2] "5.10.17, 09:26 - Person One: How about we chill on sunday\n"
[3] "0"
[4] "5.10.17, 09:27 - Person One: I could bring some beer\n"
[5] "0"
[6] "5.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"
[7] "0"
[8] "5.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"
[9] "0"
[10] "5.10.17, 09:27 - Person Two: ???"
[11] "0"
[12] "5.10.17, 09:28 - Person Two: You guys have history?\n"
[13] "0"
[14] "5.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
结果应该:
[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n"
[2] "05.10.17, 09:27 - Person One: I could bring some beer\n"
[3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"
[4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"
[5] "05.10.17, 09:27 - Person Two: ???\n"
[6] "05.10.17, 09:28 - Person Two: You guys have history?\n"
[7] 05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
注意:我无法在换行符指示符处拆分数据,因为某些消息包含消息中间的一个或多个消息。
答案 0 :(得分:2)
当\n
后跟日期时,您只需要创建一个拆分模式。
strsplit(gsub("(.*?\\n)(\\d+[.]\\d+[.]\\d+)","\\1SPLITHERE\\2",TEST),"SPLITHERE")
[[1]]
[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n"
[2] "05.10.17, 09:27 - Person One: I could bring some beer\n"
[3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"
[4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"
[5] "05.10.17, 09:27 - Person Two: ???\n"
[6] "05.10.17, 09:28 - Person Two: You guys have history?\n"
[7] "05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
您也可以使用基础r中的rematches
regmatches(TEST,gregexpr(".*?\\n",TEST))
[[1]]
[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n"
[2] "05.10.17, 09:27 - Person One: I could bring some beer\n"
[3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"
[4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"
[5] "05.10.17, 09:27 - Person Two: ???\n"
[6] "05.10.17, 09:28 - Person Two: You guys have history?\n"
[7] "05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
答案 1 :(得分:1)
您可以在积极前瞻之前添加白色字符类\\s
。
我稍微更改了您的示例,以使其更准确地匹配您的问题(即在标题中添加\ n)
> TEST <- c("05.10.17, 09:26 - Person One: How about\n we chill on sunday\n05.10.17, 09:27 - Person One: I could bring some beer\n05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n05.10.17, 09:27 - Person Two: ???\n05.10.17, 09:28 - Person Two: You guys have history?\n05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n")
> unlist(strsplit(TEST,"\\s(?=[0-3][0-9][.][0-9]{2}[.][0-9]{2}[,][ ][0-9]{2}:[0-9]{2})", perl = TRUE))
## [1] "05.10.17, 09:26 - Person One: How about\n we chill on sunday"
## [2] "05.10.17, 09:27 - Person One: I could bring some beer"
## [3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards"
## [4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-"
## [5] "05.10.17, 09:27 - Person Two: ???"
## [6] "05.10.17, 09:28 - Person Two: You guys have history?"
## [7] "05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"
答案 2 :(得分:1)
strsplit(TEST, '(?<=\\\n|^)(0)',perl=T)[[1]][2:7]