从特定行/索引处的一个Dataframe搜索并添加值到特定行/索引处的另一个df

时间:2017-07-12 16:27:17

标签: python pandas dataframe data-cleaning

Pandas Manipulation DF问题

我想在原始DF(df)中创建一个新列,该列是来自另一个DF(dfKey)的特定索引处的值。

我有点卡住了(我确定我错过了一些明显但我无法解码当前错误消息'KeyError: 'Name')。

数据:

import numpy as np
import pandas as pd
raw_data = {'Code': [250, 200, 875, 1200],
    'Metric1': [1.4, 350, 0.2, 500],
    'Metric999': [1.2, 375, 0.22, 505],} 
df = pd.DataFrame(raw_data, columns = ['Code','Metric1', 'Metric999',])

df.set_index('Code', inplace=True) #Set Code as Row Index
print(df)

raw_dataKey = {'Code': [250, 1200, 205, 2899, 875, 5005],
    'Ticker': ['NVID', 'ATVI', 'CRM', 'GOOGL', 'TSLA','GE', ],       
    'Name': ['NVIDA Corp', 'Activision', 'SalesForce', 'Googlyness', 'Tesla Company','General Electric']} 
dfKey = pd.DataFrame(raw_dataKey , columns = ['Code','Ticker', 'Name'])
dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(dfKey)

所需输出df.head()):

      Ticker           Name  Code  Metric1  Metric999
Code  
250     NVID     NVIDA Corp   250      1.4       1.20
200      NaN            NaN   200    350.0     375.00
875     TSLA  Tesla Company   875      0.2       0.22
1200    ATVI     Activision  1200    500.0     505.00

我认为执行此操作的最佳方法是for循环,因为我尝试过的所有其他方法(例如df['Name']=np.where(df['Code']==dfKey['Code'], dfKey['Name']))仅比较/测试同一索引处的每一行;没有搜索。

我的最新尝试:

codes=df.index.tolist()
codes

for code in codes:
    #1. Find Name and Ticker in Key
    name = dfKey['Name'].loc[code]
    ticker = dfKey['Ticker'].loc[code]
    #2. Put Name and Ticker back in original
    df['Name'].loc[code] = name 
    df['Ticker'].loc[code] = ticker 

2 个答案:

答案 0 :(得分:2)

我认为你需要merge

dfKey.merge(df, left_index=True, right_index=True, how='outer')

输出:

     Ticker              Name  Metric1  Metric999
Code                                             
200     CRM        SalesForce    350.0     375.00
250    NVID        NVIDA Corp      1.4       1.20
875    TSLA     Tesla Company      0.2       0.22
1200   ATVI        Activision    500.0     505.00
2899  GOOGL        Googlyness      NaN        NaN
5005     GE  General Electric      NaN        NaN

答案 1 :(得分:2)

IIUC:

In [13]: df.join(dfKey)
Out[13]:
      Metric1  Metric999 Ticker           Name
Code
250       1.4       1.20   NVID     NVIDA Corp
200     350.0     375.00    NaN            NaN
875       0.2       0.22   TSLA  Tesla Company
1200    500.0     505.00   ATVI     Activision