Question

我尝试删除每行不需要的字符/ #http格式代码如下：

import csv

with open('C:\\project\\in.csv','r') as input_file:  

    with open('C:\\project\\out.csv','w') as output_file:             

        for L in input_file:    

            if L.endswith("/"):
                newL=L.replace("/","") 
                output_file.write(newL)           

            elif L.find("#"):
                newL,sep,tail=L.partition("#")
                output_file.write(newL)           

            elif L.startswith('http:'):
                newL=L.replace('http:','https:')
                output_file.write(newL)

这是用于测试的in.csv文件的迷你示例：

line1/
line2#sdgsgs
https://line3
http://line4
line5/

干净之后，我希望它像：

line1
line2
https://line3
https://line4
line5

但结果不是我想要的，有人可以帮我一把。

非常感谢，亨利

Answer 1

在此版本中，一行可以包含所有替换字符：

#!/usr/bin/env python

import csv

Output = []
with open('C:\\project\\in.csv', 'r') as input_file:
    for line in input_file:
        line = line.strip()

        if line.endswith("/"):
            line = line.replace("/", "")

        if "#" in line:
            line, sep, tail = line.partition("#")

        if line.startswith('http:'):
            line = line.replace('http:', 'https:')

        Output.append(line)

with open('C:\\project\\out.csv', 'w') as output_file:
    for output in Output:
        output_file.write("{}\n".format(output))

将输出：

line1
line2
https://line3
https://line4
line5

使用python清理csv数据

1 个答案: