我有这个excel文件,第一列中有公共条目,如何使用python删除第一列中的公共条目而不影响文件的其余部分。
我的文件
Column1 Column2 Column2
PinkFloyd Wish You Were Here Wish You Were Here
PinkFloyd Comfortably Numb The Wall
AC_DC Highway to Hell Highway to Hell
AC_DC Thunderstruck The Razors Edge
必需的输出-
Column1 Column2 Column3
PinkFloyd Wish You Were Here Wish You Were Here
Comfortably Numb The Wall
AC_DC Highway to Hell Highway to Hell
Thunderstruck The Razors Edge
答案 0 :(得分:1)
使用pandas
,特别是pandas.DataFrame.drop_duplicates
import pandas as pd
df = pd.read_excel('my_xls.xls')
# Find and drop duplicates in Column1
df['Column1'] = df.Column1.drop_duplicates()
# Open pandas ExcelWriter and write to *.xls file
with pd.ExcelWriter('my_xls.xls') as writer:
df.to_excel(writer, index=False)