Python分割具有可变间距的列

时间:2016-11-12 15:55:15

标签: python regex python-3.x split

我正在尝试解析具有由单个空格和多个空格分隔的列的表。我可以使用re.split来分隔多于1个空格的列,但是必须重新拆分由单个空格分隔的列。下面的代码通过多次拆分第4列和第5列来实现这一点,但有更好或更有效的方法吗?

我使用的方法似乎效率低下:

我的代码:

import re

string = '''No  Mon     Date           Time Values   colors
1   Nov     11-03-2016     23:17:52 Red   colors
2   Nov     11-03-2016     19:18:00 Yellow   colors
3   Nov     11-03-2016     19:18:18 Blue   colors
4   Oct     10-03-2016     19:22:58 Orange Green   colors
5   Oct     10-07-2016     10:37:36 Red Blue Yellow   colors
6   Oct     10-07-2016     10:37:36 White   colors
7   Sep     09-07-2016     10:37:37 Ping White Yellow Green   colors'''

for i in string.splitlines():
    col1 =re.split(r'\s{2,}', i)[0]
    col2 =re.split(r'\s{2,}', i)[1]
    col3 = re.split(r'\s{2,}', i)[2]
    col4 = re.split(r'\s{2,}', i)[3].split()[0]
    col5 = ' '.join(re.split(r'\s{2,}', i)[3].split()[1:])

    print('{:3} | {:3} | {:10} | {:10} | {:23}|'.format(col1, col2, col3, col4, col5))

输出:

No  | Mon | Date       | Time       | Values                 |
1   | Nov | 11-03-2016 | 23:17:52   | Red                    |
2   | Nov | 11-03-2016 | 19:18:00   | Yellow                 |
3   | Nov | 11-03-2016 | 19:18:18   | Blue                   |
4   | Oct | 10-03-2016 | 19:22:58   | Orange Green           |
5   | Oct | 10-07-2016 | 10:37:36   | Red Blue Yellow        |
6   | Oct | 10-07-2016 | 10:37:36   | White                  |
7   | Sep | 09-07-2016 | 10:37:37   | Ping White Yellow Green|

1 个答案:

答案 0 :(得分:1)

您可以先在一个split操作中获取4个值,然后使用\s{2,}拆分第4个元素:

for i in string.splitlines():
    arr = re.split(r'\s+', i, 4)
    print('{:3} | {:3} | {:10} | {:10} | {:23}|'.
          format(arr[0], arr[1], arr[2], arr[3], re.split(r'\s{2,}', arr[4])[0]))

No  | Mon | Date       | Time       | Values                 |
1   | Nov | 11-03-2016 | 23:17:52   | Red                    |
2   | Nov | 11-03-2016 | 19:18:00   | Yellow                 |
3   | Nov | 11-03-2016 | 19:18:18   | Blue                   |
4   | Oct | 10-03-2016 | 19:22:58   | Orange Green           |
5   | Oct | 10-07-2016 | 10:37:36   | Red Blue Yellow        |
6   | Oct | 10-07-2016 | 10:37:36   | White                  |
7   | Sep | 09-07-2016 | 10:37:37   | Ping White Yellow Green|

Code Demo