我有一个数据框,其中包含我所有的训练,验证和测试数据。还有一个仅包含我的测试数据的数据框。数据点由“ data_index”指定。
df_all = pd.DataFrame({'data_index': range(7), 'split': 'NA'})
df_all.set_index('data_index', inplace=True)
df_test = pd.DataFrame({'data_index': [3, 5], 'split': 'test'})
df_test.set_index('data_index', inplace=True)
split
data_index
0 NA
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
split
data_index
3 test
5 test
如何根据测试数据框在第一个数据框中填写“拆分”列的值?为了得到这样的东西:
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val
答案 0 :(得分:2)
df_all['split'] = df_all.index.map(df_test['split'].get)
df_all['split']= df_all['split'].fillna('train/val')
print (df_all)
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val
如果缺少值,请使用combine_first
:
#defined np.nan for missing values, not string NA
df_all = pd.DataFrame({'data_index': range(7), 'split': np.nan})
df_all.set_index('data_index', inplace=True)
df_test = pd.DataFrame({'data_index': [3, 5], 'split': 'test'})
df_test.set_index('data_index', inplace=True)
df_all['split'] = df_all['split'].combine_first(df_test['split']).fillna('train/val')
print (df_all)
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val
答案 1 :(得分:1)
除了如上所述的Index.map之外,还可以使用一些基本概念通过以下方法解决该问题:
df = pd.merge(df_all, df_test, how='left', on='data_index')
df.drop(['split_x'], axis=1, inplace=True)
df = df.rename(columns={'split_y': 'split'})
df.loc[df.split != 'test', 'split'] = 'train/val'
每行之后的结果是:
split_x split_y
data_index
0 NA NaN
1 NA NaN
2 NA NaN
3 NA test
4 NA NaN
5 NA test
6 NA NaN
split_y
data_index
0 NaN
1 NaN
2 NaN
3 test
4 NaN
5 test
6 NaN
split
data_index
0 NaN
1 NaN
2 NaN
3 test
4 NaN
5 test
6 NaN
split
data_index
0 train/val
1 train/val
2 train/val
3 test
4 train/val
5 test
6 train/val