使用python清理csv数据

时间:2018-04-29 15:08:54

标签: python

我尝试删除每行不需要的字符/ #http格式 代码如下:

import csv

with open('C:\\project\\in.csv','r') as input_file:  

    with open('C:\\project\\out.csv','w') as output_file:             

        for L in input_file:    

            if L.endswith("/"):
                newL=L.replace("/","") 
                output_file.write(newL)           

            elif L.find("#"):
                newL,sep,tail=L.partition("#")
                output_file.write(newL)           

            elif L.startswith('http:'):
                newL=L.replace('http:','https:')
                output_file.write(newL)

这是用于测试的in.csv文件的迷你示例:

line1/
line2#sdgsgs
https://line3
http://line4
line5/

干净之后,我希望它像:

line1
line2
https://line3
https://line4
line5

但结果不是我想要的,有人可以帮我一把。

非常感谢,亨利

1 个答案:

答案 0 :(得分:1)

在此版本中,一行可以包含所有替换字符:

#!/usr/bin/env python

import csv

Output = []
with open('C:\\project\\in.csv', 'r') as input_file:
    for line in input_file:
        line = line.strip()

        if line.endswith("/"):
            line = line.replace("/", "")

        if "#" in line:
            line, sep, tail = line.partition("#")

        if line.startswith('http:'):
            line = line.replace('http:', 'https:')

        Output.append(line)

with open('C:\\project\\out.csv', 'w') as output_file:
    for output in Output:
        output_file.write("{}\n".format(output))

将输出:

line1
line2
https://line3
https://line4
line5