出现在第二个数据框中的第一个数据框的键并标记该事实

时间:2016-07-30 10:27:51

标签: python pandas dataframe

我有两个数据框:

data = {
    'year': ['11:23:19', '11:23:19', '11:24:19', '11:25:19', '11:25:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19'],
    'store_number': ['1944', '1945', '1946', '1948', '1948', '1949', '1947', '1948', '1949', '1947'],
    'retailer_name': ['Walmart', 'Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
    'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
    'id': [10, 10, 11, 11, 11, 10, 10, 11, 11, 10]
}

df1 = pd.DataFrame(data, columns = ['retailer_name', 'store_number', 'year', 'amount', 'id'])
df1.set_index(['retailer_name', 'store_number', 'year'], inplace = True)
retailer_name store_number year      amount  id
Walmart       1944         11:23:19       5  10
              1945         11:23:19       5  10
CRV           1946         11:24:19       8  11
              1948         11:25:19       6  11
                           11:25:19       1  11
Walmart       1949         11:23:19       5  10
              1947         11:23:19      10  10
CRV           1948         11:23:19       6  11
              1949         11:23:19      12  11
              1947         11:23:19      11  10

第二个:

data2 = {
    'year': ['11:23:19', '11:23:19', '13:23:19'],
    'store_number': [1944, 1947, 1978],
    'retailer_name': ['Walmart', 'CRV', 'CRV12'],
    'amount': [5, 11, 11]
}

df2 = pd.DataFrame(data2, columns = ['retailer_name', 'store_number', 'year', 'amount'])
df2.set_index(['retailer_name', 'store_number', 'year'], inplace = True)
retailer_name store_number year      amount
Walmart       1944         11:23:19       5
CRV           1947         11:23:19      11
CRV12         1978         13:23:19      11

如何查看df1中显示的df2的键,并在显示的内容上标记1,如果不是,则0

retailer_name store_number year      amount  flag
Walmart       1944         11:23:19       5    1
CRV           1947         11:23:19      11    1
CRV12         1978         13:23:19      11    0

1 个答案:

答案 0 :(得分:1)

如果您确保两个多索引具有相同的dtypes,则可以使用MultiIndex.intersection()方法:

In [74]: df2['flag'] = 0

In [75]: df2.ix[df2.index.intersection(df.index), 'flag'] = 1
c:\envs\py35\lib\site-packages\IPython\terminal\ipapp.py:344: PerformanceWarning: indexing past lexsort depth may impact performance.
  self.shell.mainloop()

In [76]: df2
Out[76]:
                                     amount  flag
retailer_name store_number year
Walmart       1944         11:23:19       5     1
CRV           1947         11:23:19      11     1
CRV12         1978         13:23:19      11     0

注意:它不适用于您的示例DF,因为列store_number具有不同的dtypes:string中的dfint中的df2