我有BigMart销售的测试和培训数据,我需要从测试数据中的训练数据中选择这些元素。这是train_data_df.head()和test_data_df.head()。
培训数据:
Item_Weight | Item_Visibility | Item_MRP | Item_Outlet_Sales
9.30 0.016047 249.8092 3735.1380
5.92 0.019278 48.2692 443.4228
17.50 0.016760 141.6180 2097.2700
19.20 0.000000 182.0950 732.3800
8.93 0.000000 53.8614 994.7052
TEST_DATA:
Item_Weight | Item_Visibility | Item_MRP
20.750 0.007565 107.8622
8.300 0.038428 87.3198
14.600 0.099575 241.7538
7.315 0.015388 155.0340
-999.000 0.118599 234.2300
现在我该怎么做?
答案 0 :(得分:0)
您可以尝试使用set.intersection()函数。它的工作原理如下:
>>> set([1, 2, 3]).intersection(set([4,5,3]))
set([3])
Or you can use other options:
>>> [x for x in a if x in b]
or:
>>> set(a) & set(b)
or:
new_list = []
for element in a:
if element in b:
new_list.append(element)
答案 1 :(得分:0)
In [348]: test = pd.read_csv(r'D:\download\Test_u94Q5KV.csv', index_col=0)
In [349]: train = pd.read_csv(r'D:\download\Train_UWu5bXk.csv', index_col=0)
In [351]: test.shape
Out[351]: (5681, 10)
In [352]: train.shape
Out[352]: (8523, 11)
测试数据中存在的训练数据中的那些元素
In [365]: train.loc[train.index.isin(test.index)]
Out[365]:
Item_Weight Item_Fat_Content ... Outlet_Type Item_Outlet_Sales
Item_Identifier ...
FDA15 9.300 Low Fat ... Supermarket Type1 3735.1380
DRC01 5.920 Regular ... Supermarket Type2 443.4228
FDN15 17.500 Low Fat ... Supermarket Type1 2097.2700
FDX07 19.200 Regular ... Grocery Store 732.3800
NCD19 8.930 Low Fat ... Supermarket Type1 994.7052
FDP36 10.395 Regular ... Supermarket Type2 556.6088
FDO10 13.650 Regular ... Supermarket Type1 343.5528
... ... ... ... ... ...
NCJ19 18.600 Low Fat ... Supermarket Type2 858.8820
FDF53 20.750 reg ... Supermarket Type1 3608.6360
FDF22 6.865 Low Fat ... Supermarket Type1 2778.3834
FDS36 8.380 Regular ... Supermarket Type1 549.2850
NCJ29 10.600 Low Fat ... Supermarket Type1 1193.1136
FDN46 7.210 Regular ... Supermarket Type2 1845.5976
DRG01 14.800 Low Fat ... Supermarket Type1 765.6700
[8383 rows x 11 columns]
训练数据中未出现在测试数据中的那些元素
In [366]: train.loc[~train.index.isin(test.index)]
Out[366]:
Item_Weight Item_Fat_Content ... Outlet_Type Item_Outlet_Sales
Item_Identifier ...
FDX20 7.365 Low Fat ... Supermarket Type1 3169.2080
FDG33 NaN Regular ... Supermarket Type3 3435.5280
FDW13 8.500 Low Fat ... Supermarket Type1 259.6620
FDG24 7.975 Low Fat ... Supermarket Type1 1081.9250
DRE49 20.750 Low Fat ... Supermarket Type1 2277.0360
NCY18 7.285 Low Fat ... Supermarket Type1 4377.6350
DRE49 20.750 LF ... Supermarket Type1 2580.6408
... ... ... ... ... ...
NCL31 NaN Low Fat ... Supermarket Type3 3578.6750
FDL10 NaN Low Fat ... Supermarket Type3 1884.8798
NCQ06 NaN Low Fat ... Supermarket Type3 6630.0364
NCL31 7.390 LF ... Supermarket Type1 5296.4390
FDO52 11.600 Regular ... Supermarket Type1 1539.9954
FDX20 NaN Low Fat ... Grocery Store 452.7440
FDL10 8.395 Low Fat ... Supermarket Type1 2579.3092
[140 rows x 11 columns]