保存到csv时如何从多索引熊猫数据框中删除重复的值

时间:2020-10-16 00:37:16

标签: python pandas dataframe csv

我有一个这样的数据框:

df=pd.DataFrame([[1,"10/1/2019","I",2879],
                 [1,"10/1/2019","O",196],
                 [1,"10/2/2019","I",2840],
                 [1,"10/2/2019","O",189],
                 [2,"10/1/2019","I",2907],
                 [2,"10/1/2019","O",195]],
                columns=["A","B","C","D"])
df.set_index(["A","B","C"],inplace=True)

当我显示内容时,它们看起来或多或少像这样:

                    D
A   B           C   
1   10/1/2019   I   2879
                O   196
    10/2/2019   I   2840
                O   189
2   10/1/2019   I   2907
                O   195

所以我的问题是如何生成一个csv文件,其内容看起来像在笔记本上显示的那样,其中“ A”和“ B”值在随后的行中变为空。换句话说,这就是我需要csv文件输出的样子:

A,B,C,D
1,10/1/2019,I,2879
,,O,196
,10/2/2019,I,2840
,,O,189
2,10/1/2019,I,2907
,,O,195

1 个答案:

答案 0 :(得分:1)

不要设置索引,而是创建一些返回TrueFalse的序列,其中True将值设为空白:

df=pd.DataFrame([[1,"10/1/2019","I",2879],
                 [1,"10/1/2019","O",196],
                 [1,"10/2/2019","I",2840],
                 [1,"10/2/2019","O",189],
                 [2,"10/1/2019","I",2907],
                 [2,"10/1/2019","O",195]],
                columns=["A","B","C","D"])
s1 = df.duplicated(subset=['A'])
s2 = df.duplicated(subset=['A','B'])
df['A'] = df['A'].where(~s1,'')
df['B'] = df['B'].where(~s2,'')
df
Out[1]: 
   A          B  C     D
0  1  10/1/2019  I  2879
1                O   196
2     10/2/2019  I  2840
3                O   189
4  2  10/1/2019  I  2907
5                O   195