我正在尝试解析具有由单个空格和多个空格分隔的列的表。我可以使用re.split来分隔多于1个空格的列,但是必须重新拆分由单个空格分隔的列。下面的代码通过多次拆分第4列和第5列来实现这一点,但有更好或更有效的方法吗?
我使用的方法似乎效率低下:
我的代码:
import re
string = '''No Mon Date Time Values colors
1 Nov 11-03-2016 23:17:52 Red colors
2 Nov 11-03-2016 19:18:00 Yellow colors
3 Nov 11-03-2016 19:18:18 Blue colors
4 Oct 10-03-2016 19:22:58 Orange Green colors
5 Oct 10-07-2016 10:37:36 Red Blue Yellow colors
6 Oct 10-07-2016 10:37:36 White colors
7 Sep 09-07-2016 10:37:37 Ping White Yellow Green colors'''
for i in string.splitlines():
col1 =re.split(r'\s{2,}', i)[0]
col2 =re.split(r'\s{2,}', i)[1]
col3 = re.split(r'\s{2,}', i)[2]
col4 = re.split(r'\s{2,}', i)[3].split()[0]
col5 = ' '.join(re.split(r'\s{2,}', i)[3].split()[1:])
print('{:3} | {:3} | {:10} | {:10} | {:23}|'.format(col1, col2, col3, col4, col5))
输出:
No | Mon | Date | Time | Values |
1 | Nov | 11-03-2016 | 23:17:52 | Red |
2 | Nov | 11-03-2016 | 19:18:00 | Yellow |
3 | Nov | 11-03-2016 | 19:18:18 | Blue |
4 | Oct | 10-03-2016 | 19:22:58 | Orange Green |
5 | Oct | 10-07-2016 | 10:37:36 | Red Blue Yellow |
6 | Oct | 10-07-2016 | 10:37:36 | White |
7 | Sep | 09-07-2016 | 10:37:37 | Ping White Yellow Green|
答案 0 :(得分:1)
您可以先在一个split
操作中获取4个值,然后使用\s{2,}
拆分第4个元素:
for i in string.splitlines():
arr = re.split(r'\s+', i, 4)
print('{:3} | {:3} | {:10} | {:10} | {:23}|'.
format(arr[0], arr[1], arr[2], arr[3], re.split(r'\s{2,}', arr[4])[0]))
No | Mon | Date | Time | Values |
1 | Nov | 11-03-2016 | 23:17:52 | Red |
2 | Nov | 11-03-2016 | 19:18:00 | Yellow |
3 | Nov | 11-03-2016 | 19:18:18 | Blue |
4 | Oct | 10-03-2016 | 19:22:58 | Orange Green |
5 | Oct | 10-07-2016 | 10:37:36 | Red Blue Yellow |
6 | Oct | 10-07-2016 | 10:37:36 | White |
7 | Sep | 09-07-2016 | 10:37:37 | Ping White Yellow Green|