Question

我是Python的新手，我正在弄乱项目所需的一些数据。

我想读取CSV并编写更干净的版本，以便以后处理。

['509,1', '22-10-2018', '05:00', '', '', '11473809', '', '', '', '', '290318']
['509,1', '22-10-2018', '15:00', '', '', '', '', '', '27076', '', '', '', '', '', '', '', '400']

问题在于文本文件有时一行中有更多空格，并将其视为新列。

509,1 29-08-2018 12:00   22034905     307257
509,1 29-08-2018 14:00          0          0
509,1 29-08-2018 15:00          0          0
509,1 29-08-2018 16:00          0        433
509,1 29-08-2018 17:00        433        433

如何跳过这些列？

import csv

with open('t:/509.txt', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=" ")

    with open('t:/509out.csv', 'w') as new_file:
        csv_writer = csv.writer(new_file, delimiter=";")

        for line in csv_reader:
            print(line)
#            csv_writer.writerow(line)

预先感谢

Answer 1

您可以在csv.reader()中使用skipinitialspace参数。

为True时，分隔符后的空白将被忽略。默认值为False。

csv.reader(csv_file, delimiter=" ", skipinitialspace=True)

输出：

['509,1', '29-08-2018', '12:00', '22034905', '307257']
['509,1', '29-08-2018', '14:00', '0', '0']
['509,1', '29-08-2018', '15:00', '0', '0']
['509,1', '29-08-2018', '16:00', '0', '433']
['509,1', '29-08-2018', '17:00', '433', '433']

Answer 2

我想读取CSV并编写更干净的版本，以便以后处理。

如果您只想清理和规范化文件中的空格，您可以将空格压缩为单个空格。

赞：

import re

with open('t:/509.txt', 'r') as csv_file:
    text = csv_file.read()

text = re.sub(' +', ' ', text)

输出：

509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433

Answer 3

仅使用正则表达式：

import re

with open("t:/509.txt", 'r') as my_file:
    content = my_file.read()
    lines = [re.findall(r'[^ ]{1,}', x) for x in content.split("\n")]

    with open("t:/509out.csv", 'w') as out_file:
        for l in lines:
            out_file.write(";".join(l) + "\n")
    out_file.close()

my_file.close()

读取有时包含多个空格的CSV

3 个答案: