Question

我正在尝试拆分包含日期时间指示器前面消息的字符向量。

我在考虑将strsplit()与正则表达式perl = TRUE
一起使用
以下是一些示例数据：

TEST <- c("05.10.17, 09:26 - Person One: How about we chill on sunday\n05.10.17, 09:27 - Person One: I could bring some beer\n05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n05.10.17, 09:27 - Person Two: ???\n05.10.17, 09:28 - Person Two: You guys have history?\n05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n")

这是我到目前为止所尝试的：

Cut <- unlist(strsplit(TEST,"(?=[0-3][0-9][.][0-9]{2}[.][0-9]{2}[,][ ][0-9]{2}:[0-9]{2})", perl = TRUE)) Cut

根据this website，正则表达式应该在日期时间指示符前面剪切字符串。但是，我得到的结果看起来像这样，第一个字符被切断了：

[1] "0" [2] "5.10.17, 09:26 - Person One: How about we chill on sunday\n" [3] "0" [4] "5.10.17, 09:27 - Person One: I could bring some beer\n" [5] "0" [6] "5.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n" [7] "0" [8] "5.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n" [9] "0" [10] "5.10.17, 09:27 - Person Two: ???" [11] "0" [12] "5.10.17, 09:28 - Person Two: You guys have history?\n" [13] "0" [14] "5.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"

结果应该：

[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n" [2] "05.10.17, 09:27 - Person One: I could bring some beer\n" [3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n" [4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n" [5] "05.10.17, 09:27 - Person Two: ???\n" [6] "05.10.17, 09:28 - Person Two: You guys have history?\n" [7] 05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"

注意：我无法在换行符指示符处拆分数据，因为某些消息包含消息中间的一个或多个消息。

Answer 1

当\n后跟日期时，您只需要创建一个拆分模式。

 strsplit(gsub("(.*?\\n)(\\d+[.]\\d+[.]\\d+)","\\1SPLITHERE\\2",TEST),"SPLITHERE")
[[1]]
[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n"                         
[2] "05.10.17, 09:27 - Person One: I could bring some beer\n"                              
[3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"  
[4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"                              
[5] "05.10.17, 09:27 - Person Two: ???\n"                                                  
[6] "05.10.17, 09:28 - Person Two: You guys have history?\n"                               
[7] "05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"

您也可以使用基础r中的rematches

 regmatches(TEST,gregexpr(".*?\\n",TEST))
[[1]]
[1] "05.10.17, 09:26 - Person One: How about we chill on sunday\n"                         
[2] "05.10.17, 09:27 - Person One: I could bring some beer\n"                              
[3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n"  
[4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n"                              
[5] "05.10.17, 09:27 - Person Two: ???\n"                                                  
[6] "05.10.17, 09:28 - Person Two: You guys have history?\n"                               
[7] "05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"

Answer 2

您可以在积极前瞻之前添加白色字符类\\s。

我稍微更改了您的示例，以使其更准确地匹配您的问题（即在标题中添加\ n）

> TEST <- c("05.10.17, 09:26 - Person One: How about\n we chill on sunday\n05.10.17, 09:27 - Person One: I could bring some beer\n05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards\n05.10.17, 09:27 - Person One: shit man, not LiNDA -.-\n05.10.17, 09:27 - Person Two: ???\n05.10.17, 09:28 - Person Two: You guys have history?\n05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n")
> unlist(strsplit(TEST,"\\s(?=[0-3][0-9][.][0-9]{2}[.][0-9]{2}[,][ ][0-9]{2}:[0-9]{2})", perl = TRUE))

## [1] "05.10.17, 09:26 - Person One: How about\n we chill on sunday"                         
## [2] "05.10.17, 09:27 - Person One: I could bring some beer"                                
## [3] "05.10.17, 09:27 - Person Two: Sounds good, we could go to Lindas Party afterwards"    
## [4] "05.10.17, 09:27 - Person One: shit man, not LiNDA -.-"                                
## [5] "05.10.17, 09:27 - Person Two: ???"                                                    
## [6] "05.10.17, 09:28 - Person Two: You guys have history?"                                 
## [7] "05.10.17, 09:28 - Person One: She killed my family and sold their ears as souvenirs\n"

Answer 3

strsplit(TEST, '(?<=\\\n|^)(0)',perl=T)[[1]][2:7]

具有正面lookhead的正则表达式仍然使用strsplit（）将字符串拆分到错误的位置

3 个答案: