从字符串中创建两个列表,括号中的字符串不包括在内

时间:2018-08-22 14:58:43

标签: python list pandas list-comprehension

假设我们有一个像这样的字符串:

s = u'apple banana lemmon (hahaha) dog cat whale (hehehe) red blue black'

我要创建以下列表:

including = ['hahaha', 'hehehe']
excluding = ['apple banana lemmon (', ') dog cat whale (', ') red blue black']

第一个列表是使用正则表达式直接显示的

including = re.findall('\((.*?)\)',s)

但是我无法从其他列表中得到类似的信息。你可以帮帮我吗?预先谢谢你!

3 个答案:

答案 0 :(得分:3)

使用RegEx

a = re.findall('\)?[^()]*\(?', s)
excluded = a[::2]
included = a[1::2]
print(included, excluded, sep='\n')

['hahaha', 'hehehe', '']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']

照顾空字符串

a = re.findall('\)?[^()]*\(?', s)
excluded = [*filter(bool, a[::2])]
included = [*filter(bool, a[1::2])]
print(included, excluded, sep='\n')

['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']

没有正则表达式

from itertools import cycle

def f(s):
  c = cycle('()')
  a = {'(': 1, ')': 0}
  while s:
    d = next(c)
    i = s.find(d)
    if i > -1:
      j = a[d]
      yield d, s[:i + j]
      s = s[i + j:]
    else:
      yield d, s
      break

included = []
excluded = []

for k, v in f(s):
  if k == '(':
    excluded.append(v)
  else:
    included.append(v)

print(included, excluded, sep='\n')

['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']

没有覆盖s的相同想法

from itertools import cycle

def f(s):
  c = cycle('()')
  a = {'(': 1, ')': 0}
  j = 0
  while True:
    d = next(c)
    i = s.find(d, j)
    if i > -1:
      k = a[d]
      yield d, s[j:i + k]
      j = i + k
    else:
      yield d, s[j:]
      break

included = []
excluded = []

for k, v in f(s):
  if k == '(':
    excluded.append(v)
  else:
    included.append(v)

print(included, excluded, sep='\n')

['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']

答案 1 :(得分:1)

excluding = re.split('|'.join(including), s)

在一个简单的情况下,您知道包含信息将不包含特殊字符或正则表达式定义。

如果不确定是否会这样:

re.split('|'.join(map(re.escape, including)), s)

这将转义特殊的正则表达式字符,否则这些字符会导致re.split函数功能异常

答案 2 :(得分:1)

您可以使用正向后看和正向前看在括号之间分割单词:

>>> re.split(r'(?<=\().*?(?=\))', s)
['apple banana lemmon (', ') dog cat whale (', ') red blue black']