我有一个CSV文件,其中包含一些我想删除的不需要的文本列。有人可以帮我这个吗?
对于任何混淆,抱歉,输入文件如下:
TIME,P0,P1,P2,P3,P4,P5,D0,D1,D2,D3,D4,D5
22:46:32,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600
22:46:33,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=5000:Volt=1.26625:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600
22:46:34,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=5000:Volt=1.26625:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600
22:46:35,PS=P0:Spd=5000:Volt=1.26:SP=FFull,PS=P0:Spd=5000:Volt=1.26625:SP=FFull,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=P0:Spd=4800:Volt=1.24:SP>P0SP,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600,PS=M0:SPd=1600
输出文件:
TIME,P0,Volt0,P1,volt1,P2,volt2,P3,volt3,P4,volt4,P5,volt5,D0,D1,D2,D3,D4,D5
22:46:32,5000,1.26,5000,1.26,4900,1.25,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
22:46:33,4800,1.25,4900,1.15,5000,1.26,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
22:46:34,5000,1.26,4900,1.25,4900,1.25,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
22:46:35,5000,1.26,5000,1.26,5000,1.26,5000,1.26,5000,1.26,4900,1.25,1600,1600,1600,1600,1600,1600
答案 0 :(得分:0)
以下是使用正则表达式然后创建数据框的示例。您可以在regex101.com上了解有关正则表达式的更多信息。以下是您问题的一部分:https://regex101.com/r/docq3i/1
import re
import pandas as pd
csvfile = """\
PS=P0:Spd=5000:Volt=1.26:SP=Full , PS=T0:Spd=1700
PS=P0:Spd=300:Volt=1.46:SP=Full , PS=T0:Spd=12000
"""
# Patterns to look for where (.*?) is unkown
patterns = ['P0:Spd=(.*?):Volt',
':Volt=(.*?):SP=Full',
'T0:Spd=(.*?)\n']
# Extract values
values = [re.findall (x, csvfile, re.MULTILINE | re.DOTALL) for x in patterns]
# Create dataframe
df = pd.DataFrame(values).T
# Output dataframe
df.to_csv("output.csv",index=False,header=None,sep=",")
Output.csv:
5000,1.26,1700
300,1.46,12000