用开始和结束字符串分开字符串

时间:2018-11-15 15:54:19

标签: r regex string separator

在R中,我有一系列字符串,例如:

“新建:\ r \ nRemote_UI:无法启动Apple CarPlay应用程序(P3_DA18018395_012)(91735)\ r \ n媒体:首次将iPhone授权为BTA设备后,当用户从“当前曲目列表”(DA18018395_015)中选择一首歌曲\ r \ n \ r \ n已知:\ r \ nHWR在导航条目中未读出(89412)”

我想得到类似的东西:

New:
[1] Remote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)
[2] Media: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from "Current tracklist" (DA18018395_015)

Known:
[1] HWR in navigation entry is not read out (89412)

请注意,可能只有“新建”,只有“已知”,没有一个或两个都以不同的顺序排列。有任何想法吗?谢谢!

1 个答案:

答案 0 :(得分:1)

您可以使用

x <- "New:\r\nRemote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r\nMedia: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from \"Current tracklist\" (DA18018395_015)\r\n\r\nKnown:\r\nHWR in navigation entry is not read out (89412)"
New <- regmatches(x, gregexpr("(?:\\G(?!\\A)\\R+|New:\\R+)\\K.+(?!\\R+\\w+:\\R)", x, perl=TRUE))
Known <- regmatches(x, gregexpr("(?:\\G(?!\\A)\\R+|Known:\\R+)\\K.+(?!\\R+\\w+:\\R)", x, perl=TRUE))

请参见R demo online

输出:

[[1]]
[1] "Remote_UI: Apple CarPlay application cannot be started (P3_DA18018395_012) (91735)\r"                                                                                                     
[2] "Media: After an iPhone is authorised as BTA device for the first time, Entertainment volume is abruptly set to zero when the user picks a song from \"Current tracklist\" (DA18018395_015"

[[1]]
[1] "HWR in navigation entry is not read out (89412)"

使用的正则表达式是

(?:\G(?!\A)\R+|New:\R+)\K.+(?!\R+\w+:\R)

请参见regex demo online。第二个正则表达式与此区别仅在于字面量Known

详细信息

  • (?:\G(?!\A)\R+|New:\R+)-上一场比赛的结束和1个以上的换行符(\G(?!\A)\R+或(|New:,然后是1个或多个换行符({{ 1}})
  • \R+-匹配重置运算符会丢弃到目前为止匹配的整个文本
  • \K-尽可能多加1个以上的字符,而不是换行符
  • .+-如果在当前位置的右边立即存在以下条件,则匹配失败的否定前行:
    • (?!\R+\w+:\R)-1个以上的换行符,
    • \R+-1个以上的字符字符
    • \w+-冒号
    • :-换行符。