我的非结构化文本中包含很多日期,我想在" Message" 之前提取日期。我看到的数据如下:
$result = mysql_query($qry, $conn);
并且输出将是一个新的数据框,其中包含一列日期:
21 March 2017 23:10:45 text1
21 March 2017 23:10:45 More text…..
21 March 2017 23:10:45 And more text …..
21 March 2017 23:10:45 some more text **Message:** more text
22 March 2017 23:10:45 text1
22 March 2017 23:10:45 More text…..
22 March 2017 23:10:45 And more text …..
22 March 2017 23:10:45 some more text **Message:** more text
23 March 2017 23:10:45 text1
23 March 2017 23:10:45 More text…..
23 March 2017 23:10:45 And more text …..
23 March 2017 23:10:45 some more text **Message:** more text
24 March 2017 23:10:45 text1
24 March 2017 23:10:45 More text…..
24 March 2017 23:10:45 And more text …..
24 March 2017 23:10:45 some more text **Message:** more text
答案 0 :(得分:3)
怎么样
sub("(?<=\\d{4}).*", "", grep("Message", txt, value=TRUE), perl=TRUE)
# [1] "21 March 2017" "22 March 2017" "23 March 2017" "24 March 2017"
首先,我们使用grep()
将txt
简化为仅包含“消息”的值,然后使用sub()
删除第一次出现四位数后的所有文本。< / p>
数据:
txt <- readLines(textConnection("21 March 2017 23:10:45 text1
21 March 2017 23:10:45 More text…..
21 March 2017 23:10:45 And more text …..
21 March 2017 23:10:45 some more text **Message:** more text
22 March 2017 23:10:45 text1
22 March 2017 23:10:45 More text…..
22 March 2017 23:10:45 And more text …..
22 March 2017 23:10:45 some more text **Message:** more text
23 March 2017 23:10:45 text1
23 March 2017 23:10:45 More text…..
23 March 2017 23:10:45 And more text …..
23 March 2017 23:10:45 some more text **Message:** more text
24 March 2017 23:10:45 text1
24 March 2017 23:10:45 More text…..
24 March 2017 23:10:45 And more text …..
24 March 2017 23:10:45 some more text **Message:** more text
"))