正则表达式只匹配逗号而不是括号或方括号

时间:2018-02-28 01:16:35

标签: java regex

我想用不在括号或方括号中的逗号分隔字符串

我正在使用以下字符串

  

土豆,植物油(向日葵,玉米和/或菜籽油),蜂蜜   烧烤调料[糖,盐,葡萄糖,Torula酵母,洋葱粉,   香料],麦芽糖糊精果糖,酵母提取物,糖蜜,天然香料   [含牛奶],玉米淀粉,蜂蜜,阿拉伯胶,辣椒提取物,   焦糖色,大蒜粉,柠檬酸和葵花籽油

我希望如何拆分(+表示我希望拆分发生在哪里)

  

土豆+植物油(向日葵,玉米和/或菜籽油)+蜂蜜烧烤调味料[糖,盐,葡萄糖,Torula酵母,洋葱粉,香料] +麦芽糖糊精果糖+酵母提取物+糖蜜+天然香料[含牛奶] +玉米淀粉+蜂蜜+阿拉伯胶+辣椒粉+焦糖色+大蒜粉+柠檬酸+葵花籽油

我最接近的工作是这个

,(?![^\[\(]*[$\]\)])

take up with your preferred deity

2 个答案:

答案 0 :(得分:3)

也许你想要这样的东西:

(?!<(?:\(|\[)[^)\]]+),(?![^(\[]+(?:\)|\]))

Demo

当输入到Java时(注意在随机位置插入额外的](以使其格式正确):

  

土豆,植物油(向日葵,玉米和/或菜籽油),蜂蜜烧烤调味料[糖,盐,葡萄糖,Torula酵母],洋葱粉,香料,麦芽糖糊精果糖,酵母提取物,糖蜜,天然香料[包括牛奶],玉米淀粉,蜂蜜,阿拉伯树胶,辣椒粉提取物,焦糖色素(大蒜粉,柠檬酸和葵花籽油)。

它产生输出:

Potatoes
 Vegetable Oil (Sunflower, Corn, And/or Canola Oil)
 Honey BBQ Seasoning [Sugar, Salt, Dextrose, Torula Yeast]
 Onion Powder
 Spices
 Maltodextrin Fructose
 Yeast Extract
 Molasses
 Natural Flavor [Including Milk]
 Corn Starch
 Honey
 Gum Arabic
 Paprika Extracts
 Caramel Color (Garlic Powder, Citric Acid, And Sunflower Oil).

这正是“在顶级逗号中拆分”。

但请注意,此正则表达式效率非常低。用正则表达式计算括号并不是一个好主意。似乎可以通过简单的扫描左后跟简单拆分来解决。

答案 1 :(得分:2)

有时候,你最好不要搜索你想要的东西(即白名单),而不是试图找到你想要的东西之间的分裂点(即黑名单):

String haystack = "Potatoes, Vegetable Oil (Sunflower, Corn, And/or Canola Oil), "
    + "Honey BBQ Seasoning [Sugar, Salt, Dextrose, Torula Yeast], Onion Powder, "
    + "Spices, Maltodextrin Fructose, Yeast Extract, Molasses, "
    + "Natural Flavor [Including Milk], Corn Starch, Honey, Gum Arabic, "
    + "Paprika Extracts, Caramel Color (Garlic Powder, Citric Acid, And Sunflower Oil).";

Matcher m = Pattern.compile("\\w[^\\[(,]*(\\[[^]]*\\]|\\([^)]*\\))?")
                   .matcher(haystack);
while (m.find()) {
    System.out.println("'" + m.group() + "'");
}

输出:

'Potatoes'
'Vegetable Oil (Sunflower, Corn, And/or Canola Oil)'
'Honey BBQ Seasoning [Sugar, Salt, Dextrose, Torula Yeast]'
'Onion Powder'
'Spices'
'Maltodextrin Fructose'
'Yeast Extract'
'Molasses'
'Natural Flavor [Including Milk]'
'Corn Starch'
'Honey'
'Gum Arabic'
'Paprika Extracts'
'Caramel Color (Garlic Powder, Citric Acid, And Sunflower Oil)'

请注意,生成的字符串不包含任何前导或尾随空格。

正则表达式解释:

"\w[^\[(,]*(\[[^]]+\]|\([^)]*\))?" - 反斜杠转义处理后 "\w " - 找一封信 " [^\[(,]* " - ...除了[ (,之外的任何内容 " ( | )?" - ...可选地后面跟着:
" \[ \] " - ......括号内的东西
" [^]]* " - .........除]之外的任何事物 " \( \) " - ......或括号内的东西
" [^)]* " - .........除了)

之外的其他任何内容