如何从Python中删除CSV文件中的多个字符

时间:2017-11-12 07:36:42

标签: python csv

我有一个CSV文件,其中包含一些我想删除的不需要的文本列。有人可以帮我这个吗?

对于任何混淆,抱歉,输入文件如下:

TIME,P0,P1,P2,P3,P4,P5,D0,D1,D2,D3,D4,D5    
22:46:32,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600   
22:46:33,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=5000:Volt=1.26625:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600   
22:46:34,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=5000:Volt=1.26625:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600   
22:46:35,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=5000:Volt=1.26625:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600   

输出文件:

TIME,P0,Volt0,P1,volt1,P2,volt2,P3,volt3,P4,volt4,P5,volt5,D0,D1,D2,D3,D4,D5
22:46:32,5000,1.26,5000,1.26,4900,1.25,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
22:46:33,4800,1.25,4900,1.15,5000,1.26,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
22:46:34,5000,1.26,4900,1.25,4900,1.25,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
22:46:35,5000,1.26,5000,1.26,5000,1.26,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600

1 个答案:

答案 0 :(得分:0)

以下是使用正则表达式然后创建数据框的示例。您可以在regex101.com上了解有关正则表达式的更多信息。以下是您问题的一部分:https://regex101.com/r/docq3i/1

import re
import pandas as pd

csvfile = """\
PS=P0:Spd=5000:Volt=1.26:SP=Full , PS=T0:Spd=1700
PS=P0:Spd=300:Volt=1.46:SP=Full , PS=T0:Spd=12000
"""

# Patterns to look for where (.*?) is unkown
patterns = ['P0:Spd=(.*?):Volt',
           ':Volt=(.*?):SP=Full',
           'T0:Spd=(.*?)\n']

# Extract values
values = [re.findall (x, csvfile, re.MULTILINE | re.DOTALL) for x in patterns]

# Create dataframe
df = pd.DataFrame(values).T

# Output dataframe
df.to_csv("output.csv",index=False,header=None,sep=",")

Output.csv:

5000,1.26,1700
300,1.46,12000