我正在尝试使用re两次来搜索和拆分数据 例如:
a
我正在找到[]中的所有子字符串
b
我正在尝试分割空间
private Action[] _functions;
public void MainEntryPoint()
{
_functions = new Action[] { StartTrialWithFixedValue1, StartTrialWithFixedValue2, StartTrialWithRandomValue };
List<int> trialMarkers = new List<int>() { 1, 1, 2, 2, 3 };
DoThings(trialMarkers);
}
public void DoThings(IEnumerable<int> indexesOfFuctions)
{
foreach (var index in indexesOfFuctions)
{
_functions[index-1]();
}
}
private void StartTrialWithFixedValue1()
{
Trace.WriteLine("StartTrialWithFixedValue1");
}
private void StartTrialWithFixedValue2()
{
Trace.WriteLine("StartTrialWithFixedValue2");
}
private void StartTrialWithRandomValue()
{
Trace.WriteLine("StartTrialWithRandomValue");
}
我的代码是:
[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
但这给了我一个错误-不允许我重复使用
任何建议都会很棒!预先谢谢
答案 0 :(得分:2)
>>> sum([date.split() for date in re.findall(r'\[(.*?)\]', file)], [])
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']
或使用itertools.chain
>>> from itertools import chain
>>> list(chain(*re.findall(r'\[(\S+) (\S+)\]', file)))
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']
答案 1 :(得分:1)
更新您的正则表达式以第一次捕获每个组,完全不需要 split
:
re.findall(r'\[(.*?)\s(.*?)\]', s)
[('2018-07-10', '15:04:11'),
('2018-07-10', '15:04:12'),
('2018-07-10', '15:04:42'),
('2018-07-10', '15:04:42')]
如果您需要将其作为扁平化列表:
[elem for grp in re.findall(r'\[(.*?)\s(.*?)\]', s) for elem in grp]
['2018-07-10',
'15:04:11',
'2018-07-10',
'15:04:12',
'2018-07-10',
'15:04:42',
'2018-07-10',
'15:04:42']
答案 2 :(得分:1)
import re
data = """[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
"""
new_data = []
re.sub(r'\[(.*?)\].*', lambda g: new_data.extend(g[1].split()), data)
print(','.join(new_data))
输出:
2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42
答案 3 :(得分:1)
使用re.findall()
和.split()
,因为不需要两次使用正则表达式。
import re
a = '''[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"'''
[item for sublist in [n.split() for n in re.findall(r'\[(.*?)\]',a)] for item in sublist]
['2018-07-10',
'15:04:11',
'2018-07-10',
'15:04:12',
'2018-07-10',
'15:04:42',
'2018-07-10',
'15:04:42']
答案 4 :(得分:0)
您的file
变量具有re.findall
的元素列表
尝试:
import re
file = re.findall(r'\[(.*?)\]', file)
m = [re.split(r'\ +', i) for i in file]
print(m)
输出:
[['2018-07-10', '15:04:11'], ['2018-07-10', '15:04:12'], ['2018-07-10', '15:04:42'], ['2018-07-10', '15:04:42']]