Pandas KeyError:['']在合并来自csv文件

时间:2017-07-13 23:05:27

标签: python pandas

我正在尝试合并2个dfs,其中第二个df有3个额外的列,但其余的是相同的。当我尝试合并dfs时,我在合并代码的第4行中得到以下错误 - df4 = df4 [cols]

  

KeyError:" ['产品名称' '销售价格' '批次名称']不在索引"

以下是每个df的列:

enter image description here

我的代码如下:

DF2

file = "non-payment-data.csv"
path = root + file
name_cols = ['GUID1','GUID2', 'Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name']
pull_cols = ['Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name']
df2 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col=False)

Data columns (total 5 columns):
Org ID          10 non-null object
Org Name        10 non-null object
Product Name    10 non-null object
Sales Price     10 non-null int64
Batch Name      10 non-null object
dtypes: int64(1), object(4)

DF3

file = "payment-data.csv"
path = root + file
name_cols = ['GUID1', 'Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
pull_cols = ['Org ID', 'Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df3 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col=False)

  Data columns (total 8 columns):
Org ID              9 non-null object
Org Name            9 non-null object
Product Name        9 non-null object
Sales Price         9 non-null int64
Batch Name          9 non-null object
Payment Amount      9 non-null int64
Transaction Date    9 non-null object
Add Date            9 non-null object
dtypes: int64(2), object(6)

合并

df4 = pd.merge(df2, df3, how='left', on=['Org ID', 'Org Name'])
cols = ['Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df4 = df4[cols]
df4.head()

Data columns (total 7 columns):
Org Name            10 non-null object
Product Name        10 non-null object
Sales Price         10 non-null int64
Batch Name          10 non-null object
Payment Amount      0 non-null float64
Transaction Date    0 non-null object
Add Date            0 non-null object
dtypes: float64(1), int64(1), object(5)

我根据研究尝试了以下内容:

df4['Batch Name'] = fillna(method='ffill', inplace = True) #same for the other two

df4 = df4.reindex(cols=cols)

1 个答案:

答案 0 :(得分:0)

合并后,还有其他列名称。这些列已更改为包含左侧重复列的后缀_x和右侧重复项的_y

您可以使用suffixes参数

控制后缀
df4 = pd.merge(df2, df3, how='left', on=['Org ID', 'Org Name'], suffixes=['', '_'])
cols = ['Org Name', 'Product Name', 'Sales Price', 'Batch Name', 'Payment Amount', 'Transaction Date', 'Add Date']
df4 = df4[cols]
df4.head()