如何串联2个dataFrames并将重复项放在熊猫的特定列上

时间:2018-11-10 21:19:35

标签: python pandas

我有以下两个dataFrame:

DF_1

LEU,LEID,DUNS,BVD,LEO
'1','2','3','4','5'
'2','1','2','3','4'
'2','AA','','',''
'3','3','','',''
'4','4','','',''

DF_2

LEID
'1'
'2'

我想从DF_1中删除LEID列上与DF_2匹配的行。

我使用了这段代码,但是没有用:

import pandas as pd
import os

originalFile=os.path.abspath("D:\\python\\test\\OriginalFile.csv")
remove_leid=os.path.abspath("D:\\python\\test\\LEID.csv")
df = pd.read_csv(originalFile)
df_2=pd.read_csv(remove_leid)
df_concat = pd.concat([df, df_2])
df_concat.drop_duplicates(subset='LEID', keep=False)

df_concat.to_csv('D:\\python\\test\\CorrectedFile.csv')
print (df_concat)

预期的返回结果:

        LEU,LEID,DUNS,BVD,LEO
        '2','AA','','',''
        '3','3','','',''
        '4','4','','',''

我尝试过:

import pandas as pd
import os

originalFile=os.path.abspath("D:\\python\\test\\OriginalFile.csv")
remove_leid=os.path.abspath("D:\\python\\test\\LEID.csv")
df = pd.read_csv(originalFile)
remove_leid = ['1','2']

df[~df.LEID.isin(remove_leid)]

df.to_csv('D:\\python\\test\\CorrectedFile.csv')
print (df)

但是仍然无法正常工作。我一直都有:

   LEU  LEID DUNS  BVD  LEO
0  '1'   '2'  '3'  '4'  '5'
1  '2'   '1'  '2'  '3'  '4'
2  '2'  'AA'   ''   ''   ''
3  '3'    ''   ''   ''   ''
4  '4'    ''   ''   ''   ''

我也尝试失败:

import pandas as pd
import os

originalFile=os.path.abspath("D:\\python\\test\\OriginalFile.csv")
remove_leid=os.path.abspath("D:\\python\\test\\LEID.csv")
df = pd.read_csv(originalFile, sep=',', quotechar="'")
df_2=pd.read_csv(remove_leid, sep=',', quotechar="'")


df[~df.LEID.isin(remove_leid)]

df = df[~df['LEID'].isin(df_2['LEID'])]


df.to_csv('D:\\python\\test\\CorrectedFile.csv')
print (df)

0 个答案:

没有答案