我有星期五所有证券的价格表。一些证券在星期六和星期五保持不变的价格。我想将周五至周六的价格复制为不在周六列出的证券价格。
我尝试通过使用熊猫merge
来执行以下任务。
我将outer
设置为Indicator
,按照以下步骤True
加入两个数据帧。
# df_friday has 10 securities
# df _saturday has 3 securities
merge_df=pd.merge(df_friday,df_saturday,on='security',how="outer",indicator=True)
merge_df = merge_df[merge_df['_merge']=='left_only']
merge_df =merge_df.drop(['price_y','_merge'],axis=1)
merge_df = merge_df.rename(columns = {'price_x':'price'})
df_saturday = pd.concat([df_saturday,merge_df],ignore_index=True)
我的两个数据框的列相同,
Columns: [security, price]
我做对了吗?还是我可以用一种简单的方式做到这一点?
例如,
# df_friday
security price
1 apple 35.25
2 reliance 25.5
3 samsung 12.5
4 tata 28.5
5 sony 30.2
# df_saturday
security price
1 reliance 26.8
2 samsung 11.2
# df_saturday_result should be as follows,
security price
1 reliance 26.8
2 samsung 11.2
3 apple 35.25
4 tata 28.5
5 sony 30.2
答案 0 :(得分:3)
我认为您可以做到
df_saturday = df_saturday.merge(df_friday, how='outer', on=['security','price']).drop_duplicates(['security'], keep='first')
print(df_saturday)
输出:
price security
0 26.80 reliance
1 11.20 samsung
2 35.25 apple
5 28.50 tata
6 30.20 sony
答案 1 :(得分:2)
您也可以使用pandas.concat():
代码:
import pandas as pd
fri =pd.DataFrame (columns =['security', 'price'], index = range(3), data =[['a',2],['b',4],['c',6]] )
sat =pd.DataFrame (columns =['security', 'price'], index = range(2), data =[['a',3],['c',5]] )
print ('TEST DATA:')
print (fri)
print (sat)
print ('\nSOLUTION 1: concatenate and eliminate duplicates')
result_1 = pd.concat([sat,fri],ignore_index=True).drop_duplicates(subset=['security'], keep='first')
print (result_1)
print ('\nSOLUTION 2: filter unique and then concatenate')
fri_unique = fri[~fri.security.isin(sat.security)]
result_2 = pd.concat([sat, fri_unique], ignore_index=True)
print (result_2)
TEST DATA:
security price
0 a 2
1 b 4
2 c 6
security price
0 a 3
1 c 5
SOLUTION 1: concatenate and eliminate duplicates
security price
0 a 3
1 c 5
3 b 4
SOLUTION 2: filter unique and then concatenate
security price
0 a 3
1 c 5
2 b 4
答案 2 :(得分:2)
对其他答案中提到的3种方法进行了一些时间检查。
fri =pd.DataFrame (columns =['security', 'price'], index = range(3), data =[['a',2],['b',4],['c',6]] )
sat =pd.DataFrame (columns =['security', 'price'], index = range(2), data =[['a',3],['c',5]] )
In [90]: %timeit out = sat.merge(fri, how='outer', on=['security', 'price']).drop_duplicates()
5.19 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [91]: %timeit result_1 = pd.concat([sat,fri],ignore_index=True).drop_duplicates(subset=['security'], keep='first')
1.82 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [92]: %timeit result_2 = pd.concat([sat, fri[~fri.security.isin(sat.security)]], ignore_index=True)
1.19 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [93]: %timeit out = sat.merge(fri, how='outer', on=['security', 'price']).drop_duplicates()
5.02 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
看起来过滤然后串联是最快的,而concat然后dedup并不太差。相比之下,合并速度相当慢。