我正在尝试合并2个dfs,其中第二个df有3个额外的列,但其余的是相同的。当我尝试合并dfs时,我在合并代码的第4行中得到以下错误 - df4 = df4 [cols] :
KeyError:" ['产品名称' '销售价格' '批次名称']不在索引"
中
以下是每个df的列:
我的代码如下:
DF2
file = "non-payment-data.csv"
path = root + file
name_cols = ['GUID1','GUID2', 'Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name']
pull_cols = ['Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name']
df2 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col=False)
Data columns (total 5 columns):
Org ID 10 non-null object
Org Name 10 non-null object
Product Name 10 non-null object
Sales Price 10 non-null int64
Batch Name 10 non-null object
dtypes: int64(1), object(4)
DF3
file = "payment-data.csv"
path = root + file
name_cols = ['GUID1', 'Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
pull_cols = ['Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df3 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col=False)
Data columns (total 8 columns):
Org ID 9 non-null object
Org Name 9 non-null object
Product Name 9 non-null object
Sales Price 9 non-null int64
Batch Name 9 non-null object
Payment Amount 9 non-null int64
Transaction Date 9 non-null object
Add Date 9 non-null object
dtypes: int64(2), object(6)
合并
df4 = pd.merge(df2, df3, how='left', on=['Org ID', 'Org Name'])
cols = ['Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df4 = df4[cols]
df4.head()
Data columns (total 7 columns):
Org Name 10 non-null object
Product Name 10 non-null object
Sales Price 10 non-null int64
Batch Name 10 non-null object
Payment Amount 0 non-null float64
Transaction Date 0 non-null object
Add Date 0 non-null object
dtypes: float64(1), int64(1), object(5)
我根据研究尝试了以下内容:
df4['Batch Name'] = fillna(method='ffill', inplace = True) #same for the other two
和
df4 = df4.reindex(cols=cols)
答案 0 :(得分:0)
合并后,还有其他列名称。这些列已更改为包含左侧重复列的后缀_x
和右侧重复项的_y
。
您可以使用suffixes
参数
df4 = pd.merge(df2, df3, how='left', on=['Org ID', 'Org Name'], suffixes=['', '_'])
cols = ['Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df4 = df4[cols]
df4.head()