读取有时包含多个空格的CSV

时间:2018-11-13 12:02:52

标签: python csv

我是Python的新手,我正在弄乱项目所需的一些数据。

我想读取CSV并编写更干净的版本,以便以后处理。

['509,1', '22-10-2018', '05:00', '', '', '11473809', '', '', '', '', '290318']
['509,1', '22-10-2018', '15:00', '', '', '', '', '', '27076', '', '', '', '', '', '', '', '400']

问题在于文本文件有时一行中有更多空格,并将其视为新列。

509,1 29-08-2018 12:00   22034905     307257
509,1 29-08-2018 14:00          0          0
509,1 29-08-2018 15:00          0          0
509,1 29-08-2018 16:00          0        433
509,1 29-08-2018 17:00        433        433

如何跳过这些列?

import csv

with open('t:/509.txt', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=" ")

    with open('t:/509out.csv', 'w') as new_file:
        csv_writer = csv.writer(new_file, delimiter=";")

        for line in csv_reader:
            print(line)
#            csv_writer.writerow(line)

预先感谢

3 个答案:

答案 0 :(得分:2)

您可以在csv.reader()中使用skipinitialspace参数。

  

为True时,分隔符后的空白将被忽略。默认值为False。

csv.reader(csv_file, delimiter=" ", skipinitialspace=True)

输出:

['509,1', '29-08-2018', '12:00', '22034905', '307257']
['509,1', '29-08-2018', '14:00', '0', '0']
['509,1', '29-08-2018', '15:00', '0', '0']
['509,1', '29-08-2018', '16:00', '0', '433']
['509,1', '29-08-2018', '17:00', '433', '433']

答案 1 :(得分:0)

  

我想读取CSV并编写更干净的版本,以便以后处理。

如果您只想清理和规范化文件中的空格, 您可以将空格压缩为单个空格。

赞:

import re

with open('t:/509.txt', 'r') as csv_file:
    text = csv_file.read()

text = re.sub(' +', ' ', text)

输出:

509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433

答案 2 :(得分:0)

仅使用正则表达式:

import re

with open("t:/509.txt", 'r') as my_file:
    content = my_file.read()
    lines = [re.findall(r'[^ ]{1,}', x) for x in content.split("\n")]

    with open("t:/509out.csv", 'w') as out_file:
        for l in lines:
            out_file.write(";".join(l) + "\n")
    out_file.close()

my_file.close()