我在使用pd.merge时遇到问题。我有以下数据
private async void Load()
{
var settings = await UIHandler.LoadSettingsAsync();
bapi_client_ID.Text = settings.Dienst[0].ApiKey[0].Key;
bapi_Client_Secret.Text = settings.Dienst[0].ApiKey[1].Key;
}
private async void Save_settings_Click(object sender, RoutedEventArgs e)
{
statusBar.Text = "Save settings...";
var settings = ConvertAPIJson();
await UIHandler.SaveSettingsAsync(settings);
statusBar.Text = "Settings saved!";
}
我希望为每个客户重复所有日期,如下所示:
from pandas import DataFrame
clients = {'DATE': [20150430,20150531,20150630,20150331,20150430],
'CLIENT_ID': [1,1,1,2,2],
'VALUE' : [145,202,150,175,180]}
dates = {'DATE' : [20150331,20150430,20150531,20150630,20150731]}
df1 = DataFrame(clients,columns= ['DATE', 'CLIENT_ID','VALUE'])
df2 = DataFrame(dates,columns=['DATE'])
我尝试过,但是结果不是想要的
results = {'DATE': [20150331,20150430,20150531,20150630,20150731,20150331,20150430,20150531,20150630,20150731],
'CLIENT_ID': [1,1,1,1,1,2,2,2,2,2],
'VALUE': [None,145,202,150,None,175,180,None,None,None]}
df_results = DataFrame(results,columns= ['DATE', 'CLIENT_ID','VALUE'])
谢谢您的帮助。
答案 0 :(得分:2)
我不确定为什么要为df1使用它,可以从df2创建它,这是方法reindex
df1.groupby('CLIENT_ID').apply(lambda x : x.set_index('DATE').reindex(df2.DATE).ffill().bfill()).reset_index(level=1)
DATE CLIENT_ID
CLIENT_ID
1 20150331 1.0
1 20150430 1.0
1 20150531 1.0
1 20150630 1.0
1 20150731 1.0
2 20150331 2.0
2 20150430 2.0
2 20150531 2.0
2 20150630 2.0
2 20150731 2.0
如果我们是从df2创建的
pd.DataFrame({'ID':df1.CLIENT_ID.unique()}).assign(key=1).merge(df2.assign(key=1))
ID key DATE
0 1 1 20150331
1 1 1 20150430
2 1 1 20150531
3 1 1 20150630
4 1 1 20150731
5 2 1 20150331
6 2 1 20150430
7 2 1 20150531
8 2 1 20150630
9 2 1 20150731
答案 1 :(得分:1)
可以从产品构成基础,然后left
合并其他信息。
from itertools import product
import pandas as pd
(pd.DataFrame(product(df1.CLIENT_ID.unique(), df2.DATE),
columns=['CLIENT_ID', 'DATE'])
.merge(df1, how='left'))
CLIENT_ID DATE VALUE
0 1 20150331 NaN
1 1 20150430 145.0
2 1 20150531 202.0
3 1 20150630 150.0
4 1 20150731 NaN
5 2 20150331 175.0
6 2 20150430 180.0
7 2 20150531 NaN
8 2 20150630 NaN
9 2 20150731 NaN
如果要关注产品步骤的性能,this answer会很有帮助
或者用set_index
+ reindex
:
idx = pd.MultiIndex.from_product([df1.CLIENT_ID.unique(), df2.DATE],
names=['CLIENT_ID', 'DATE'])
df1.set_index(['CLIENT_ID', 'DATE']).reindex(idx).reset_index()
答案 2 :(得分:0)
这似乎是您想要的:
import pandas as pd
import numpy as np
clients = {'DATE': [20150430,20150531,20150630,20150331,20150430],
'CLIENT_ID': [1,1,1,2,2],
'VALUE' : [145,202,150,175,180]}
dates = {'DATE' : [20150331,20150430,20150531,20150630,20150731]}
df1 = pd.DataFrame(clients,columns= ['DATE', 'CLIENT_ID','VALUE'])
df2 = df1.copy()
df2['CLIENT_ID'].map({1:2,2:1})
df2['VALUE']=np.NaN
df_result=df1.append(df2).reset_index()
DATE CLIENT_ID VALUE
0 20150430 1 145.0
1 20150531 1 202.0
2 20150630 1 150.0
3 20150331 2 175.0
4 20150430 2 180.0
5 20150430 1 NaN
6 20150531 1 NaN
7 20150630 1 NaN
8 20150331 2 NaN
9 20150430 2 NaN
每个DATE和CLIENT_ID的唯一行