从另一个数据框中选择一个数据框的单元格项

时间:2017-03-11 20:56:47

标签: python pandas dataframe

我有BigMart销售的测试和培训数据,我需要从测试数据中的训练数据中选择这些元素。这是train_data_df.head()和test_data_df.head()。

培训数据:

Item_Weight | Item_Visibility | Item_MRP | Item_Outlet_Sales

     9.30         0.016047      249.8092          3735.1380  
     5.92         0.019278       48.2692           443.4228  
    17.50         0.016760      141.6180          2097.2700  
    19.20         0.000000      182.0950           732.3800  
     8.93         0.000000       53.8614           994.7052  

TEST_DATA:

Item_Weight | Item_Visibility | Item_MRP

20.750         0.007565          107.8622  
8.300          0.038428           87.3198  
14.600         0.099575          241.7538  
7.315          0.015388          155.0340  
-999.000       0.118599          234.2300

现在我该怎么做?

2 个答案:

答案 0 :(得分:0)

您可以尝试使用set.intersection()函数。它的工作原理如下:

>>> set([1, 2, 3]).intersection(set([4,5,3]))
set([3])
Or you can use other options:
>>> [x for x in a if x in b]
or:
>>> set(a) & set(b)
or:
new_list = []
for element in a:
    if element in b:
       new_list.append(element)

答案 1 :(得分:0)

In [348]: test = pd.read_csv(r'D:\download\Test_u94Q5KV.csv', index_col=0)

In [349]: train = pd.read_csv(r'D:\download\Train_UWu5bXk.csv', index_col=0)

In [351]: test.shape
Out[351]: (5681, 10)

In [352]: train.shape
Out[352]: (8523, 11)

测试数据中存在的训练数据中的那些元素

In [365]: train.loc[train.index.isin(test.index)]
Out[365]:
                 Item_Weight Item_Fat_Content        ...               Outlet_Type Item_Outlet_Sales
Item_Identifier                                      ...
FDA15                  9.300          Low Fat        ...         Supermarket Type1         3735.1380
DRC01                  5.920          Regular        ...         Supermarket Type2          443.4228
FDN15                 17.500          Low Fat        ...         Supermarket Type1         2097.2700
FDX07                 19.200          Regular        ...             Grocery Store          732.3800
NCD19                  8.930          Low Fat        ...         Supermarket Type1          994.7052
FDP36                 10.395          Regular        ...         Supermarket Type2          556.6088
FDO10                 13.650          Regular        ...         Supermarket Type1          343.5528
...                      ...              ...        ...                       ...               ...
NCJ19                 18.600          Low Fat        ...         Supermarket Type2          858.8820
FDF53                 20.750              reg        ...         Supermarket Type1         3608.6360
FDF22                  6.865          Low Fat        ...         Supermarket Type1         2778.3834
FDS36                  8.380          Regular        ...         Supermarket Type1          549.2850
NCJ29                 10.600          Low Fat        ...         Supermarket Type1         1193.1136
FDN46                  7.210          Regular        ...         Supermarket Type2         1845.5976
DRG01                 14.800          Low Fat        ...         Supermarket Type1          765.6700

[8383 rows x 11 columns]

训练数据中未出现在测试数据中的那些元素

In [366]: train.loc[~train.index.isin(test.index)]
Out[366]:
                 Item_Weight Item_Fat_Content        ...               Outlet_Type Item_Outlet_Sales
Item_Identifier                                      ...
FDX20                  7.365          Low Fat        ...         Supermarket Type1         3169.2080
FDG33                    NaN          Regular        ...         Supermarket Type3         3435.5280
FDW13                  8.500          Low Fat        ...         Supermarket Type1          259.6620
FDG24                  7.975          Low Fat        ...         Supermarket Type1         1081.9250
DRE49                 20.750          Low Fat        ...         Supermarket Type1         2277.0360
NCY18                  7.285          Low Fat        ...         Supermarket Type1         4377.6350
DRE49                 20.750               LF        ...         Supermarket Type1         2580.6408
...                      ...              ...        ...                       ...               ...
NCL31                    NaN          Low Fat        ...         Supermarket Type3         3578.6750
FDL10                    NaN          Low Fat        ...         Supermarket Type3         1884.8798
NCQ06                    NaN          Low Fat        ...         Supermarket Type3         6630.0364
NCL31                  7.390               LF        ...         Supermarket Type1         5296.4390
FDO52                 11.600          Regular        ...         Supermarket Type1         1539.9954
FDX20                    NaN          Low Fat        ...             Grocery Store          452.7440
FDL10                  8.395          Low Fat        ...         Supermarket Type1         2579.3092

[140 rows x 11 columns]