将两个dfs合并为重复行,但两列除外

时间:2017-03-15 10:52:26

标签: python pandas dataframe merge

我有一个熊猫数据框(在连接两个数据帧之后),它有一些重复的行,除了两列,其中写了一些行标识符。 E.g。

  A      B C D E F
  Peter  1 c d e f
  Paula  2 g h i j
  Frank  3 c d e f
  Robert 4 k l m n
  Sarah  5 g h i j

用于测试:

df= pd.DataFrame({"A":["Peter", "Paula", "Frank", "Robert", "Sara"],
                  "B":[1,2,3,4,5],
                  "C":["c","g","c","k","g"],
                  "D":["d","h","d","l","h"],
                  "E":["e","i","e","m","i"],
                  "F":["f","j","f","n","j"]})

我想只保留字母C到F中重复项的第一次出现,并保留该行的名称和编号(列" A"和#34; B")。因此,我们会获得

  A      B C D E F
  Peter  1 c d e f
  Paula  2 g h i j
  Robert 4 k l m n

我用df.drop_duplicates尝试了一些东西,但这不适用于排除行" A"和" B"。此外,当分为两个数据帧,分别为A和B,C到D,drop_duplicate,以及之后通过索引合并不起作用,因为drop_duplicates会重置索引。那么,如何实现呢?谢谢。

1 个答案:

答案 0 :(得分:1)

df2 = df.drop_duplicates(subset=["C", "D", "E", "F"])

输出:

        A  B  C  D  E  F
0   Peter  1  c  d  e  f
1   Paula  2  g  h  i  j
3  Robert  4  k  l  m  n

请参阅here