我是Python的新手,我正在弄乱项目所需的一些数据。
我想读取CSV并编写更干净的版本,以便以后处理。
['509,1', '22-10-2018', '05:00', '', '', '11473809', '', '', '', '', '290318']
['509,1', '22-10-2018', '15:00', '', '', '', '', '', '27076', '', '', '', '', '', '', '', '400']
问题在于文本文件有时一行中有更多空格,并将其视为新列。
509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433
如何跳过这些列?
import csv
with open('t:/509.txt', 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=" ")
with open('t:/509out.csv', 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=";")
for line in csv_reader:
print(line)
# csv_writer.writerow(line)
预先感谢
答案 0 :(得分:2)
您可以在csv.reader()
中使用skipinitialspace参数。
为True时,分隔符后的空白将被忽略。默认值为False。
csv.reader(csv_file, delimiter=" ", skipinitialspace=True)
输出:
['509,1', '29-08-2018', '12:00', '22034905', '307257']
['509,1', '29-08-2018', '14:00', '0', '0']
['509,1', '29-08-2018', '15:00', '0', '0']
['509,1', '29-08-2018', '16:00', '0', '433']
['509,1', '29-08-2018', '17:00', '433', '433']
答案 1 :(得分:0)
我想读取CSV并编写更干净的版本,以便以后处理。
如果您只想清理和规范化文件中的空格, 您可以将空格压缩为单个空格。
赞:
import re
with open('t:/509.txt', 'r') as csv_file:
text = csv_file.read()
text = re.sub(' +', ' ', text)
输出:
509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433
答案 2 :(得分:0)
仅使用正则表达式:
import re
with open("t:/509.txt", 'r') as my_file:
content = my_file.read()
lines = [re.findall(r'[^ ]{1,}', x) for x in content.split("\n")]
with open("t:/509out.csv", 'w') as out_file:
for l in lines:
out_file.write(";".join(l) + "\n")
out_file.close()
my_file.close()