两次使用正则表达式

时间:2018-07-20 15:25:55

标签: python python-2.7

我正在尝试使用re两次来搜索和拆分数据 例如:

a

我正在找到[]中的所有子字符串

b

我正在尝试分割空间

 private Action[] _functions;

 public void MainEntryPoint()
 {
     _functions = new Action[] { StartTrialWithFixedValue1, StartTrialWithFixedValue2, StartTrialWithRandomValue };
     List<int> trialMarkers = new List<int>() { 1, 1, 2, 2, 3 };
     DoThings(trialMarkers);
 }

 public void DoThings(IEnumerable<int> indexesOfFuctions)
 {
     foreach (var index in indexesOfFuctions)
     {
         _functions[index-1]();
     }
 }

 private void StartTrialWithFixedValue1()
 {
     Trace.WriteLine("StartTrialWithFixedValue1");
 }

 private void StartTrialWithFixedValue2()
 {
     Trace.WriteLine("StartTrialWithFixedValue2");
 }

 private void StartTrialWithRandomValue()
 {
     Trace.WriteLine("StartTrialWithRandomValue");
 }

我的代码是:

[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"

但这给了我一个错误-不允许我重复使用

任何建议都会很棒!预先谢谢

5 个答案:

答案 0 :(得分:2)

>>> sum([date.split() for date in re.findall(r'\[(.*?)\]', file)], [])
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']

或使用itertools.chain

>>> from itertools import chain
>>> list(chain(*re.findall(r'\[(\S+) (\S+)\]', file)))
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']

答案 1 :(得分:1)

更新您的正则表达式以第一次捕获每个组,完全不需要 split

re.findall(r'\[(.*?)\s(.*?)\]', s)

[('2018-07-10', '15:04:11'),
 ('2018-07-10', '15:04:12'),
 ('2018-07-10', '15:04:42'),
 ('2018-07-10', '15:04:42')]

如果您需要将其作为扁平化列表:

[elem for grp in re.findall(r'\[(.*?)\s(.*?)\]', s) for elem in grp]

['2018-07-10',
 '15:04:11',
 '2018-07-10',
 '15:04:12',
 '2018-07-10',
 '15:04:42',
 '2018-07-10',
 '15:04:42']

答案 2 :(得分:1)

import re

data = """[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
"""

new_data = []
re.sub(r'\[(.*?)\].*', lambda g: new_data.extend(g[1].split()), data)
print(','.join(new_data))

输出:

2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42

答案 3 :(得分:1)

使用re.findall().split(),因为不需要两次使用正则表达式。

import re
a = '''[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"'''


[item for sublist in [n.split() for n in re.findall(r'\[(.*?)\]',a)] for item in sublist]
['2018-07-10',
 '15:04:11',
 '2018-07-10',
 '15:04:12',
 '2018-07-10',
 '15:04:42',
 '2018-07-10',
 '15:04:42']

答案 4 :(得分:0)

您的file变量具有re.findall的元素列表

尝试:

import re

file = re.findall(r'\[(.*?)\]', file)
m = [re.split(r'\ +', i) for i in file]
print(m)

输出:

[['2018-07-10', '15:04:11'], ['2018-07-10', '15:04:12'], ['2018-07-10', '15:04:42'], ['2018-07-10', '15:04:42']]