Question

我的非结构化文本中包含很多日期，我想在＆＃34; Message＆＃34; 之前提取日期。我看到的数据如下：

 $result = mysql_query($qry, $conn);

并且输出将是一个新的数据框，其中包含一列日期：

21 March 2017 23:10:45 text1
21 March 2017 23:10:45  More text…..
21 March 2017 23:10:45 And more text …..
21 March 2017 23:10:45 some more text **Message:** more text 
22 March 2017 23:10:45 text1
22 March 2017 23:10:45  More text…..
22 March 2017 23:10:45 And more text …..
22 March 2017 23:10:45 some more text **Message:** more text 
23 March 2017 23:10:45 text1
23 March 2017 23:10:45  More text…..
23 March 2017 23:10:45 And more text …..
23 March 2017 23:10:45 some more text **Message:** more text 
24 March 2017 23:10:45 text1
24 March 2017 23:10:45  More text…..
24 March 2017 23:10:45 And more text …..
24 March 2017 23:10:45 some more text **Message:** more text

Answer 1

怎么样

sub("(?<=\\d{4}).*", "", grep("Message", txt, value=TRUE), perl=TRUE)
# [1] "21 March 2017" "22 March 2017" "23 March 2017" "24 March 2017"

首先，我们使用grep()将txt简化为仅包含“消息”的值，然后使用sub()删除第一次出现四位数后的所有文本。< / p>

数据：

txt <- readLines(textConnection("21 March 2017 23:10:45 text1
21 March 2017 23:10:45  More text…..
21 March 2017 23:10:45 And more text …..
21 March 2017 23:10:45 some more text **Message:** more text 
22 March 2017 23:10:45 text1
22 March 2017 23:10:45  More text…..
22 March 2017 23:10:45 And more text …..
22 March 2017 23:10:45 some more text **Message:** more text 
23 March 2017 23:10:45 text1
23 March 2017 23:10:45  More text…..
23 March 2017 23:10:45 And more text …..
23 March 2017 23:10:45 some more text **Message:** more text 
24 March 2017 23:10:45 text1
24 March 2017 23:10:45  More text…..
24 March 2017 23:10:45 And more text …..
24 March 2017 23:10:45 some more text **Message:** more text 
"))

如何在非结构化数据中的特定字符串之前提取日期？

1 个答案: