用python转换csv文件

时间:2018-04-10 13:07:12

标签: python pandas csv

我是python的新手,有人知道什么是好方法吗?我可以编写脚本,但使用软件包可能会更快。

我有.csv文件(gigabytes large):

name,   value,  time
A,   1, 10
B,   2, 10
C,   3, 10
C,   3, 10 (should ignore duplicates, or non complete (A,B,C) entries
A,   4, 12 (should be sorted by time, this entry should be at the end, after time==11)
B,   5, 12
C,   6, 12
B,   7, 11 (order of A,B,C might be different)
C,   8, 11
A,   9, 11

将其转换为包含以下内容的新.csv文件:

time,   A,  B,  C
10, 1,  2,  3
11, 9,  7,  8
12, 4,  5,  6

2 个答案:

答案 0 :(得分:6)

我认为drop_duplicates需要pivot

df = df.drop_duplicates().pivot('time','name','value')
print (df)
name  A  B  C
time         
10    1  2  3
11    9  7  8
12    4  5  6

答案 1 :(得分:2)

由于我无法发表评论,我想在@jezrael上添加答案,您还希望删除不完整或NaN值。使用df.dropna

import numpy as np
import pandas as pd
A = 'a'
B = 'b'
C = 'c'
df = pd.DataFrame([[A,   1, 10],
                [B,   2, 10],
                [C,   3, 10],
                [C,   3, 10],
                [A,   4, 12],
                [B,   5, 12],
                [C,   6, 12],
                [B,   7, 11],
                [C,   8, 11],
                [A,   9, 11],
                [np.nan, 10, 0]], columns = ["name","value", "time"])
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
df = df.pivot('time','name','value')
print(df)