Question

我正在尝试合并2个dfs，其中第二个df有3个额外的列，但其余的是相同的。当我尝试合并dfs时，我在合并代码的第4行中得到以下错误 - df4 = df4 [cols] ：

KeyError：＆＃34; [＆＃39;产品名称＆＃39; ＆＃39;销售价格＆＃39; ＆＃39;批次名称＆＃39;]不在索引＆＃34;
中

以下是每个df的列：

我的代码如下：

DF2

file = "non-payment-data.csv"
path = root + file
name_cols = ['GUID1','GUID2', 'Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name']
pull_cols = ['Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name']
df2 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col=False)

Data columns (total 5 columns):
Org ID          10 non-null object
Org Name        10 non-null object
Product Name    10 non-null object
Sales Price     10 non-null int64
Batch Name      10 non-null object
dtypes: int64(1), object(4)

DF3

file = "payment-data.csv"
path = root + file
name_cols = ['GUID1', 'Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
pull_cols = ['Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df3 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col=False)

  Data columns (total 8 columns):
Org ID              9 non-null object
Org Name            9 non-null object
Product Name        9 non-null object
Sales Price         9 non-null int64
Batch Name          9 non-null object
Payment Amount      9 non-null int64
Transaction Date    9 non-null object
Add Date            9 non-null object
dtypes: int64(2), object(6)

合并

df4 = pd.merge(df2, df3, how='left', on=['Org ID', 'Org Name'])
cols = ['Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df4 = df4[cols]
df4.head()

Data columns (total 7 columns):
Org Name            10 non-null object
Product Name        10 non-null object
Sales Price         10 non-null int64
Batch Name          10 non-null object
Payment Amount      0 non-null float64
Transaction Date    0 non-null object
Add Date            0 non-null object
dtypes: float64(1), int64(1), object(5)

我根据研究尝试了以下内容：

df4['Batch Name'] = fillna(method='ffill', inplace = True) #same for the other two

和

df4 = df4.reindex(cols=cols)

Answer 1

合并后，还有其他列名称。这些列已更改为包含左侧重复列的后缀_x和右侧重复项的_y。

您可以使用suffixes参数

控制后缀

df4 = pd.merge(df2, df3, how='left', on=['Org ID', 'Org Name'], suffixes=['', '_'])
cols = ['Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df4 = df4[cols]
df4.head()

Pandas KeyError：[＆＃39;＆＃39;]在合并来自csv文件

1 个答案: