在R中,我有一系列字符串,例如:
“新建:\ r \ nRemote_UI:无法启动Apple CarPlay应用程序(P3_DA18018395_012)(91735)\ r \ n媒体:首次将iPhone授权为BTA设备后,当用户从“当前曲目列表”(DA18018395_015)中选择一首歌曲\ r \ n \ r \ n已知:\ r \ nHWR在导航条目中未读出(89412)”
我想得到类似的东西:
New:
[1] Remote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)
[2] Media: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from "Current tracklist" (DA18018395_015)
Known:
[1] HWR in navigation entry is not read out (89412)
请注意,可能只有“新建”,只有“已知”,没有一个或两个都以不同的顺序排列。有任何想法吗?谢谢!
答案 0 :(得分:1)
您可以使用
x <- "New:\r\nRemote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r\nMedia: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from \"Current tracklist\" (DA18018395_015)\r\n\r\nKnown:\r\nHWR in navigation entry is not read out (89412)"
New <- regmatches(x, gregexpr("(?:\\G(?!\\A)\\R+|New:\\R+)\\K.+(?!\\R+\\w+:\\R)", x, perl=TRUE))
Known <- regmatches(x, gregexpr("(?:\\G(?!\\A)\\R+|Known:\\R+)\\K.+(?!\\R+\\w+:\\R)", x, perl=TRUE))
请参见R demo online。
输出:
[[1]]
[1] "Remote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r"
[2] "Media: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from \"Current tracklist\" (DA18018395_015"
[[1]]
[1] "HWR in navigation entry is not read out (89412)"
使用的正则表达式是
(?:\G(?!\A)\R+|New:\R+)\K.+(?!\R+\w+:\R)
请参见regex demo online。第二个正则表达式与此区别仅在于字面量Known
。
详细信息
(?:\G(?!\A)\R+|New:\R+)
-上一场比赛的结束和1个以上的换行符(\G(?!\A)\R+
或(|
)New:
,然后是1个或多个换行符({{ 1}})\R+
-匹配重置运算符会丢弃到目前为止匹配的整个文本\K
-尽可能多加1个以上的字符,而不是换行符.+
-如果在当前位置的右边立即存在以下条件,则匹配失败的否定前行:
(?!\R+\w+:\R)
-1个以上的换行符,\R+
-1个以上的字符字符\w+
-冒号:
-换行符。