我正在尝试删除D列的重复项,以获取没有标题或标识功能的动态数据。我试图删除列D有重复的所有行。我正在将excel转换为数据帧,删除重复项,然后将其重新放入excel。但是我不断收到各种各样的错误或没有删除重复项。我来自VBA背景,但我们正在迁移到Python
尝试:
df.drop_duplicates(["C"])
df = pd.DataFrame({"C"})
df.groupby(["C"]).filter(lambda df:df.shape[0] == 1)
As well an assortment of other variations. I was able to do this in VBA with one line. Any ideas why this keeps causing this issue.
\\ import pandas as pd
df = pd.DataFrame({"C"]})
df.drop_duplicates(subset=[''C'], keep=False)
DG=df.groupby([''C'])
print pd.concat([DG.get_group(item) for item, value in DG.groups.items() if len(value)==1])
我能够在VBA中用一行完成此操作。任何想法导致这个问题的原因。
代码本身模板 -
df = pd.read_excel("C:/wadwa.xlsx", sheetname=0)
columns_to_drop = ['d.1']
#columns_to_drop = ['d.1', 'b.1', 'e.1', 'f.1', 'g.1']
import pandas as pd
Df = df[[col for col in df.columns if col not in columns_to_drop]]
print(df)
writer = pd.ExcelWriter('C:/dadwa/dwad.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
print(df)
代码:
import pandas as pd
df = pd.read_excel("C:/Users/Documents/Book1.xlsx", sheetname=0)
import pandas as pd
df = df.drop_duplicates(subset=[df.columns[3]], keep=False)
writer = pd.ExcelWriter('C:/Users//Documents/Book2.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
print(df)
答案 0 :(得分:0)
我认为您需要回拨并按位置选择第4列:
df = df.drop_duplicates(subset=[df.columns[3]], keep=False)