我有两个数据帧df
和df1
。所以我想采用差异df
和df1
,但行数不相同。就像df1
中没有df
的行5、6、7一样。
wave num stlines fwhm EWs MeasredWave
0 4050.32 3 0.28269 0.07365 22.16080 4050.311360
1 4208.98 5 0.48122 0.08765 44.90035 4208.972962
2 4374.94 9 0.71483 0.11429 86.96497 4374.927110
3 4379.74 9 0.31404 0.09107 30.44271 4379.760601
4 4398.01 14 0.50415 0.09845 52.83236 4398.007473
5 5520.50 1 0.06148 0.12556 8.21685 5520.484742
6 6052.67 1 0.04802 0.24632 12.59131 6052.650064
wave num stlines fwhm EWs MeasredWave
0 4050.32 3 0.28616 0.07521 22.91064 4050.327388
1 4208.98 6 0.48781 0.08573 44.51609 4208.990029
2 4374.94 9 0.71548 0.11437 87.10152 4374.944513
3 4379.74 10 0.31338 0.09098 30.34791 4379.778009
4 4398.01 15 0.49950 0.08612 45.78707 4398.020367
5 4502.21 9 0.56362 0.10114 60.67868 4502.223123
6 4508.28 3 0.69554 0.11600 85.88428 4508.291777
7 4512.99 2 0.20486 0.08891 19.38745 4512.999332
8 5520.50 1 0.06148 0.12556 8.21685 5520.484742
9 6052.67 1 0.04802 0.24632 12.59131 6052.650064
path_to_files = '/home/Desktop/computed_2d_blaze/'
lst = []
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df = df.drop_duplicates('wave')
df = df.sort_values('wave', ascending=True)
df = df.reset_index(drop=True)
df[filen] = None
lst.append(df)
path_to_files1 = '/home/Desktop/computed_1d/'
lst1 = []
for filen in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
df1 = pd.read_table(path_to_files1+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df1 = df1.sort_values('wave', ascending=True)
#df1 = df1.drop_duplicates('wave')
df1 = df1.reset_index(drop=True)
lst1.append(df1)
所以我想做的事情是,我想基于Cloumn wave
减去数据帧。因此,我想确保从正确的df1索引中减去df.wave
的每个索引。如果数据帧中没有wave
(行)的索引,那么我想添加一个空行以确保它不会弄乱其余数据。
所需的df输出应如下所示:
wave num stlines fwhm EWs MeasredWave
0 4050.32 3 0.28269 0.07365 22.16080 4050.311360
1 4208.98 5 0.48122 0.08765 44.90035 4208.972962
2 4374.94 9 0.71483 0.11429 86.96497 4374.927110
3 4379.74 9 0.31404 0.09107 30.44271 4379.760601
4 4398.01 14 0.50415 0.09845 52.83236 4398.007473
5 0.00 0 0.00 0.00 0.00 0.00
6 0.00 0 0.00 0.00 0.00 0.00
7 0.00 0 0.00 0.00 0.00 0.00
8 5520.50 1 0.06148 0.12556 8.21685 5520.484742
9 6052.67 1 0.04802 0.24632 12.59131 6052.650064
答案 0 :(得分:1)
给出:
wave num stlines fwhm EWs MeasredWave
0 4050.32 3 0.28269 0.07365 22.1608 4050.31
1 4208.98 5 0.48122 0.08765 44.9004 4208.97
2 4374.94 9 0.71483 0.11429 86.965 4374.93
3 4379.74 9 0.31404 0.09107 30.4427 4379.76
4 4398.01 14 0.50415 0.09845 52.8324 4398.01
5 5520.5 1 0.06148 0.12556 8.21685 5520.48
6 6052.67 1 0.04802 0.24632 12.5913 6052.65
wave num stlines fwhm EWs MeasredWave
0 4050.32 3 0.28269 0.07365 22.1608 4050.31
1 4208.98 5 0.48122 0.08765 44.9004 4208.97
2 4374.94 9 0.71483 0.11429 86.965 4374.93
3 4379.74 9 0.31404 0.09107 30.4427 4379.76
4 4398.01 14 0.50415 0.09845 52.8324 4398.01
5 4502.21 9 0.56362 0.10114 60.6787 4502.22
6 4508.28 3 0.69554 0.116 85.8843 4508.29
7 4512.99 2 0.20486 0.08891 19.3875 4513
8 5520.5 1 0.06148 0.12556 8.21685 5520.48
9 6052.67 1 0.04802 0.24632 12.5913 6052.65
尝试一下:
df_new = pd.merge(df1, df, on = 'wave', how = 'left')[['wave', 'num_y', 'stlines_y', 'fwhm_y', 'EWs_y', 'MeasredWave_y']]
df_new.fillna(0, inplace = True)
df_new['wave'] = df_new['wave']*df1['wave'].isin(df['wave']).astype(int)
输出:
wave num_y stlines_y fwhm_y EWs_y MeasredWave_y
0 4050.32 3 0.28269 0.07365 22.16080 4050.311360
1 4208.98 5 0.48122 0.08765 44.90035 4208.972962
2 4374.94 9 0.71483 0.11429 86.96497 4374.927110
3 4379.74 9 0.31404 0.09107 30.44271 4379.760601
4 4398.01 14 0.50415 0.09845 52.83236 4398.007473
5 0.00 0 0.00000 0.00000 0.00000 0.000000
6 0.00 0 0.00000 0.00000 0.00000 0.000000
7 0.00 0 0.00000 0.00000 0.00000 0.000000
8 5520.50 1 0.06148 0.12556 8.21685 5520.484742
9 6052.67 1 0.04802 0.24632 12.59131 6052.650064