删除一列动态数据中包含重复项的行的重复项

时间:2017-11-08 08:48:58

标签: python excel python-3.x pandas excel-2010

我正在尝试删除D列的重复项,以获取没有标题或标识功能的动态数据。我试图删除列D有重复的所有行。我正在将excel转换为数据帧,删除重复项,然后将其重新放入excel。但是我不断收到各种各样的错误或没有删除重复项。我来自VBA背景,但我们正在迁移到Python

尝试:

df.drop_duplicates(["C"])

df = pd.DataFrame({"C"})
df.groupby(["C"]).filter(lambda df:df.shape[0] == 1)

As well an assortment of other variations.  I was able to do this in VBA with one line.  Any ideas why this keeps causing this issue.


\\ import pandas as pd
df = pd.DataFrame({"C"]})
df.drop_duplicates(subset=[''C'], keep=False)


DG=df.groupby([''C'])   
print pd.concat([DG.get_group(item) for item, value in DG.groups.items() if len(value)==1])

我能够在VBA中用一行完成此操作。任何想法导致这个问题的原因。

代码本身模板 -

df = pd.read_excel("C:/wadwa.xlsx", sheetname=0)
columns_to_drop = ['d.1']
#columns_to_drop = ['d.1', 'b.1', 'e.1', 'f.1', 'g.1']


import pandas as pd


Df = df[[col for col in df.columns if col not in columns_to_drop]]
print(df)

writer = pd.ExcelWriter('C:/dadwa/dwad.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
print(df)

代码:

import pandas as pd

df = pd.read_excel("C:/Users/Documents/Book1.xlsx", sheetname=0)

import pandas as pd
df = df.drop_duplicates(subset=[df.columns[3]], keep=False)

writer = pd.ExcelWriter('C:/Users//Documents/Book2.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
print(df)

1 个答案:

答案 0 :(得分:0)

我认为您需要回拨并按位置选择第4列:

df = df.drop_duplicates(subset=[df.columns[3]], keep=False)