我有一个类似于以下表示的csv文件:
**Number,Timestamp,Value1,value2,Value3,Value4**
7680.0,2015-05-06 13:53:07,4.695,7.929,,
7680.0,2015-05-06 13:53:07,,,4.4118,7.8514
7681.0,2015-05-06 21:25:11,4.259,7.924,,
7681.0,2015-05-06 21:25:11,,,4.477,7.6178
我需要以下面的格式转换此文件:
**Number,Timestamp,Value1,value2,Value3,Value4**
7680.0,2015-05-06 13:53:07,4.695,7.929,4.4118,7.8514
7681.0,2015-05-06 21:25:11,4.259,7.924,4.477,7.6178
我是python 2的新手。
答案 0 :(得分:1)
import pandas as pd
df = pd.read_csv('filename.csv')
df_group = df.groupby(['Number','Timestamp']).sum()
Groupby函数会按Number
和Timestamp
对数据集进行分组。然后sum()
将汇总所有数字列。我希望这是你想要的。
答案 1 :(得分:0)
可能不是最好的解决方案,但这样做可以完成:
with open('messed_up.csv', 'r') as r and open('new.csv', 'w') as f:
simValues = []
for line in r:
line = line.replace(',,','')
line = line.split(',,,','')
try:
fOne, fTwo, fThree, fFour, fFive, fSix = line.split(',')
if fOne not in simValues:
simValues.append(fOne)
f.write(line)
else:
print "[-] " + line + " was detected as similar"
except Exception as e:
print "[-] Error : " + str(e)
答案 2 :(得分:0)
这可以通过pandas轻松处理
import pandas as pd
df = pd.read_csv("file1.csv", header=0, index_col=["**Number", "Timestamp"])
dfnew = df.groupby(df.index).sum()
dfnew.to_csv("file2.csv")