从子字符串中提取的地图运算符

时间:2018-10-03 06:23:55

标签: python regex substring operators list-comprehension

我有list of dict个:

print (L)
[{0: 'x==1', 1: 'y==2', 2: 'z!=1'}, {0: 'x==1', 1: 'y<=3', 2: 'z>1'}]

我要创建元组,其值在运算符之前,运算符在之后:

#first step
wanted = [[('x', '==', '1'), ('y', '==', '2'), ('z', '!=', '1')], 
          [('x', '==', '1'), ('y', '<=', '3'), ('z', '>', '1')]]

然后由运算符映射第二个值:

import operator

ops = {'>': operator.gt,
        '<': operator.lt,
       '>=': operator.ge,
       '<=': operator.le,
       '==': operator.eq,
        '!=': operator.ne}

#expected final output
wanted = [[('x', <built-in function eq>, '1'), 
           ('y', <built-in function eq>, '2'), 
           ('z', <built-in function ne>, '1')], 
          [('x', <built-in function eq>, '1'), 
           ('y', <built-in function le>, '3'), 
           ('z', <built-in function gt>, '1')]]

我尝试:

L = [[re.findall(r'(.*)([<>=!]+)(.*)', v)[0] for k, v in x.items()] for x in L]
print (L)
[[('x=', '=', '1'), ('y=', '=', '2'), ('z!', '=', '1')], 
 [('x=', '=', '1'), ('y<', '=', '3'), ('z', '>', '1')]]

L = [[ops[y[1]] for y in x] for x in L]

但是问题是匹配的中间子字符串-运算符错误,然后运算符的匹配值错误。

正确匹配的正确正则表达式是什么?或者这是另一种可能的解决方案。例如由string.partition创建?我打开所有可能的解决方案。

3 个答案:

答案 0 :(得分:2)

如果您的输入确实如此简单,我认为最直接的方法是分割操作符:

In [1]: import re

In [2]: data = [{0: 'x==1', 1: 'y==2', 2: 'z!=1'}, {0: 'x==1', 1: 'y<=3', 2: 'z>1'}]

In [3]: rgx = re.compile(r'([<>=!]+)')

In [4]: [[rgx.split(v) for v in d.values()] for d in data]
Out[4]:
[[['x', '==', '1'], ['y', '==', '2'], ['z', '!=', '1']],
 [['x', '==', '1'], ['y', '<=', '3'], ['z', '>', '1']]]

请注意,如果您将捕获组添加到拆分器正则表达式中,则会将其包括在内!

然后,完成它:

In [11]: ops = {'>': operator.gt,
    ...:         '<': operator.lt,
    ...:        '>=': operator.ge,
    ...:        '<=': operator.le,
    ...:        '==': operator.eq,
    ...:         '!=': operator.ne}
    ...:

In [12]: parsed = [[rgx.split(v) for v in d.values()] for d in data]

In [13]: [[(x, ops[op], y) for x,op,y in ps] for ps in parsed]
Out[13]:
[[('x', <function _operator.eq>, '1'),
  ('y', <function _operator.eq>, '2'),
  ('z', <function _operator.ne>, '1')],
 [('x', <function _operator.eq>, '1'),
  ('y', <function _operator.le>, '3'),
  ('z', <function _operator.gt>, '1')]]

答案 1 :(得分:2)

将贪婪方法1st子字符串正则表达式更改为唯一的单词字符:

L = [{0: 'x==1', 1: 'y==2', 2: 'z!=1'}, {0: 'x==1', 1: 'y<=3', 2: 'z>1'}]
L = [[re.findall(r'(\w)([<>=!]+)(.*)', v)[0] for k, v in x.items()] for x in L]
[[(y[0],ops[y[1]],y[2]) for y in x] for x in L]

[[('x', <function _operator.eq>, '1'),
  ('y', <function _operator.eq>, '2'),
  ('z', <function _operator.ne>, '1')],
 [('x', <function _operator.eq>, '1'),
  ('y', <function _operator.le>, '3'),
  ('z', <function _operator.gt>, '1')]]

或根据评论中的jezrael建议(1行列表的内容):

L = [[[(z[0], ops[z[1]], z[2]) for z in re.findall(r'(\w)([<>=!]+)(.*)', v)][0] for k, v in x.items()] for x in L]

或者我们不需要键,因此可以直接使用值:

L = [[[(z[0], ops[z[1]], z[2]) for z in re.findall(r'(\w)([<>=!]+)(.*)', v)][0] for v in x.values()] for x in L]

答案 2 :(得分:0)

问题在于*是一个贪婪的字符匹配器。因此,在x==1中,如果*可以匹配多个字符,它将在仍用单个([<>=!]+)字符满足第二组=的情况下

解决方案:

  1. 假定非运营商组将永远不会包含<>=!,而不是使用*,请使用负数字符集:

    re.findall(r'([^<>=!]+)([<>=!]+)([^<>=!]+)', v)

  2. 与竖线交替使用以捕获操作员:

    re.findall(r'(.*)((?:>|<|<=|>=|==|!=))(.*)', v)