需要在R-Perl中编写一个正则表达式,该正则表达式将在逗号',上拆分字符串,但是在圆括号之间跳过所有逗号实例。面临的挑战是确保括号之间的平衡,即右括号映射回到其右括号。
在下面的正则表达式代码中,一切正常,除非您注意到-括号不平衡,否则考虑将内端括号用作外起始括号
text <- "PEANUTS (PEANUTS, PEANUT OIL AND/OR COTTONSEED OIL AND/OR CANOLA OIL AND/OR SOYBEAN OIL, SALT), GOLDEN RAISINS (RAISINS, SULFUR DIOXIDE), DRIED CRANBERRIES (CRANBERRIES, SUGAR, CITRIC ACID, SUNFLOWER OIL (PROCESSING AID), ELDERBERRY JUICE CONCENTRATE (COLOR)), ALMONDS (ALMONDS, PEANUT OIL AND/OR COTTONSEED OIL AND/OR CANOLA OIL AND/OR SOYBEAN OIL, SALT), MACADAMIAS (MACADAMIAS, MALTODEXTRIN, SALT)"
strsplit(text, '\\([^*)^)]*\\)(*SKIP)(*F)|\\,', perl=T)
使用上面的正则表达式代码,未正确分割干蔓越莓。请在此处参考输出屏幕截图:Regex Code Output
这里的任何帮助将不胜感激。。谢谢!
答案 0 :(得分:1)
您可以使用
strsplit(text, "(\\((?:[^()]++|(?1))*\\))(*SKIP)(*F)|,", perl=TRUE)
# => [[1]]
[1] "PEANUTS (PEANUTS, PEANUT OIL AND/OR COTTONSEED OIL AND/OR CANOLA OIL AND/OR SOYBEAN OIL, SALT)"
[2] " GOLDEN RAISINS (RAISINS, SULFUR DIOXIDE)"
[3] " DRIED CRANBERRIES (CRANBERRIES, SUGAR, CITRIC ACID, SUNFLOWER OIL (PROCESSING AID), ELDERBERRY JUICE CONCENTRATE (COLOR))"
[4] " ALMONDS (ALMONDS, PEANUT OIL AND/OR COTTONSEED OIL AND/OR CANOLA OIL AND/OR SOYBEAN OIL, SALT)"
[5] " MACADAMIAS (MACADAMIAS, MALTODEXTRIN, SALT)"
详细信息
(\\((?:[^()]++|(?1))*\\))
-捕获#1的捕获组
\\(
-一个(
字符(?:[^()]++|(?1))*
-整个组1中除(
和)
之外(与[^()]++
或(|
)以外的1个以上字符的0个或更多字符模式(递归以匹配所有嵌套级别)\\)
-一个)
字符(*SKIP)(*F)
-这两个动词使引擎跳过当前匹配的字符串,并继续在此文本之后立即查找下一个匹配项。|
-或,
-逗号。答案 1 :(得分:0)
对this question的已接受答案进行编辑似乎就可以了。我只是在开始时添加了[[:alpha:][:space:]]*
。
pat <- '[[:alpha:][:space:]]*\\(((?>[^()]+)|(?R))*\\)'
regmatches(text, gregexpr(pat, text, perl = TRUE))
#[[1]]
#[1] "PEANUTS (PEANUTS, PEANUT OIL AND/OR COTTONSEED OIL AND/OR #CANOLA OIL AND/OR SOYBEAN OIL, SALT)"
#[2] " GOLDEN RAISINS (RAISINS, SULFUR DIOXIDE)"
#[3] " DRIED CRANBERRIES (CRANBERRIES, SUGAR, CITRIC ACID, SUNFLOWER #OIL (PROCESSING AID), ELDERBERRY JUICE CONCENTRATE (COLOR))"
#[4] " ALMONDS (ALMONDS, PEANUT OIL AND/OR COTTONSEED OIL AND/OR #CANOLA OIL AND/OR SOYBEAN OIL, SALT)"
#[5] " MACADAMIAS (MACADAMIAS, MALTODEXTRIN, SALT)"