我有两个数据帧。一个带有所有突变的列表(+得分相关),另一个带有实际观察到的突变子集(+测量值)。
我想将第二个数据框(观察到的子集)合并到较大的数据框(所有可能)中,并带来与观察到的突变相关的数据(拟合值)。但是,当我这样做时,合并的数据框显示所有适合值的NaN。
下面我尝试合并的代码,包括我的数据帧的样本和所得的输出(如s1)。
s1 = pd.merge(data_frame, data_frame_2, how='left', on=['position', 'mutation'])
data_frame #all possible
position mutation A_score Normalized_A_Score
0 1 * 0.00 0.000000
1 1 A 849.69 100.007062
2 1 C 849.94 100.036486
3 1 D 849.76 100.015301
4 1 E 849.67 100.004708
5 1 F 849.00 99.925850
6 1 G 849.56 99.991761
7 1 H 849.83 100.023540
8 1 I 849.63 100.000000
9 1 K 851.51 100.221273
10 1 L 849.56 99.991761
11 1 M 849.63 100.000000
12 1 N 849.63 100.000000
13 1 P 849.00 99.925850
14 1 Q 849.13 99.941151
15 1 R 851.70 100.243635
16 1 S 849.15 99.943505
17 1 T 849.94 100.036486
18 1 V 849.63 100.000000
19 1 W 849.00 99.925850
20 1 Y 849.10 99.937620
data_frame_2 #observed
position mutation fit_val adjusted_fit_val
0 1 * 0.633847 0.274555
1 1 A 0.832698 0.473406
2 1 C 0.857012 0.497719
3 1 D 0.873119 0.513827
4 1 E 0.859805 0.500512
5 1 F 0.359053 -0.000239
6 1 G 0.786489 0.427197
7 1 H 0.876687 0.517395
8 1 I 0.820826 0.461534
9 1 K 0.886447 0.527154
10 1 L 0.868197 0.508905
11 1 N 0.909416 0.550124
12 1 P 0.843697 0.484405
13 1 Q 0.838892 0.479600
14 1 R 0.878175 0.518883
15 1 S 0.981739 0.622446
16 1 T 0.709694 0.350402
17 1 W 0.866746 0.507453
18 1 Y 0.876647 0.517355
s1 #merged
position mutation A_score Normalized_A_Score fit_val adjusted_fit_val
0 1 * 0.00 0.000000 NaN NaN
1 1 A 849.69 100.007062 NaN NaN
2 1 C 849.94 100.036486 NaN NaN
3 1 D 849.76 100.015301 NaN NaN
4 1 E 849.67 100.004708 NaN NaN
5 1 F 849.00 99.925850 NaN NaN
6 1 G 849.56 99.991761 NaN NaN
7 1 H 849.83 100.023540 NaN NaN
8 1 I 849.63 100.000000 NaN NaN
9 1 K 851.51 100.221273 NaN NaN
10 1 L 849.56 99.991761 NaN NaN
11 1 M 849.63 100.000000 NaN NaN
12 1 N 849.63 100.000000 NaN NaN
13 1 P 849.00 99.925850 NaN NaN
14 1 Q 849.13 99.941151 NaN NaN
15 1 R 851.70 100.243635 NaN NaN
16 1 S 849.15 99.943505 NaN NaN
17 1 T 849.94 100.036486 NaN NaN
18 1 V 849.63 100.000000 NaN NaN
19 1 W 849.00 99.925850 NaN NaN
20 1 Y 849.10 99.937620 NaN NaN
当我将数据帧合并在一起时,为什么data_frame_2中的fit_val或Adjusted_fit_val列值不会显示?感谢您的理解帮助!
答案 0 :(得分:0)
我认为position
列有不同类型-字符串和整数:
data_frame['position'] = data_frame['position'].astype(int)
data_frame_2['position'] = data_frame_2['position'].astype(int)
s1 = pd.merge(data_frame, data_frame_2, how='left', on=['position', 'mutation'])
print (s1)
position mutation A_score Normalized_A_Score fit_val adjusted_fit_val
0 1 * 0.00 0.000000 0.633847 0.274555
1 1 A 849.69 100.007062 0.832698 0.473406
2 1 C 849.94 100.036486 0.857012 0.497719
3 1 D 849.76 100.015301 0.873119 0.513827
4 1 E 849.67 100.004708 0.859805 0.500512
5 1 F 849.00 99.925850 0.359053 -0.000239
6 1 G 849.56 99.991761 0.786489 0.427197
7 1 H 849.83 100.023540 0.876687 0.517395
8 1 I 849.63 100.000000 0.820826 0.461534
9 1 K 851.51 100.221273 0.886447 0.527154
10 1 L 849.56 99.991761 0.868197 0.508905
11 1 M 849.63 100.000000 NaN NaN
12 1 N 849.63 100.000000 0.909416 0.550124
13 1 P 849.00 99.925850 0.843697 0.484405
14 1 Q 849.13 99.941151 0.838892 0.479600
15 1 R 851.70 100.243635 0.878175 0.518883
16 1 S 849.15 99.943505 0.981739 0.622446
17 1 T 849.94 100.036486 0.709694 0.350402
18 1 V 849.63 100.000000 NaN NaN
19 1 W 849.00 99.925850 0.866746 0.507453
20 1 Y 849.10 99.937620 0.876647 0.517355