我试图解析文本文件中的数据。数据元组是一个年龄,其后0-3次是正确的#39;对齐。无论在源数据中跟踪年龄多少次,我都希望None
" pad"三次。年龄和时间都是空间分隔的,而且更进一步,时间可以是格式" mm:ss.dd"或" ss.dd"。年龄和时间可以在一行中重复一次或多次。
以下是一些示例数据:
test_str = ['25',
'24 22.10',
'16 59.35 1:02.44',
'18 52.78 59.45 1:01.22',
'33 59.35 1:02.44 34 52.78 59.45 1:01.22 24 25']
扫描,上面应该产生元组(或列表,dicts,......等)
(25, None, None, None)
(24, None, None, 0:22.10)
(16, None, 0:59.35, 1:02.44)
(18, 0:52.78, 0:59.45, 1:01.22)
(33, None, 0:59.35, 1:02.44), (34, 0:52.78, 0:59.45, 1:01.22), (24, None, None, None), (25, None, None)
我的想法是使用正则表达式,类似于:
data_search = r'[1-9][0-9]( (([1-9][0-9]:)?[0-9]{2}.[0-9]{2})|){3}'
x = re.search(data_search, test_str[0])
但我没有成功。
有人可以帮助我使用正则表达式或建议更好的解决方案吗?
答案 0 :(得分:1)
我不确定这是否是最佳方法,但是这会分离第一个元素,因为它始终静态地位于第一个位置,然后将其余部分拆分并用None
填充间隙。
test_str = ['25',
'24 22.10',
'16 59.35 1:02.44',
'18 52.78 59.45 1:01.22']
def create_tuples(string_list):
all_tuples = []
for space_string in string_list:
if not space_string:
continue
split_list = space_string.split()
first_list_element = split_list[0]
last_list_elements = split_list[1:]
all_tuples.append([first_list_element] + [None] * (3 - len(last_list_elements)) + last_list_elements)
return all_tuples
print(create_tuples(test_str))
# Returns:
[['25', None, None, None], ['24', None, None, '22.10'], ['16', None, '59.35', '1:02.44'], ['18', '52.78', '59.45', '1:01.22']]
答案 1 :(得分:1)
我相信这很接近你想要的。抱歉缺乏正则表达式。
def format_str(test_str):
res = []
for x in test_str:
parts = x.split(" ")
thing = []
for part in parts:
if len(thing) != 0 and '.' not in part and ':' not in part:
res.append(thing[:1] + [None]*(4-len(thing)) + thing[1:])
thing = [part]
else:
thing.append(part)
if len(thing) != 0:
res.append(thing[:1] + [None]*(4-len(thing)) + thing[1:])
return res
test_str = ['25',
'24 22.10',
'16 59.35 1:02.44',
'18 52.78 59.45 1:01.22 24 22.10']
results = format_str(test_str)
print(results)
结果是:
[['25', None, None, None], ['24', None, None, '22.10'], ['16', None, '59.35', '1:02.44'], ['18', '52.78', '59.45', '1:01.22'], ['24', None, None, '22.10']]
我没有对时间进行任何格式化,因此52.78并未显示为0:52.78但我敢打赌你可以做到这一点。如果没有,请发表评论,我也会为此编辑解决方案
答案 2 :(得分:0)
>>> age_expr = r"(\d+)"
>>> time_expr = r"((?:\s+)(?:\d+:)?\d+\.\d+)?"
>>> expr = re.compile(age_expr + time_expr * 3)
>>> [expr.findall(s) for s in test_str]
[[('25', '', '', '')], [('24', ' 22.10', '', '')], [('16', ' 59.35', ' 1:02.44', '')], [('18', ' 52.78', ' 59.45', ' 1:01.22')], [('33', ' 59.35', ' 1:02.44', ''), ('34', ' 52.78', ' 59.45', ' 1:01.22'), ('24', '', '', ''), ('25', '', '', '')]]