如何为每个用户ID重复一组日期?

时间:2019-05-12 17:40:59

标签: python pandas merge

我在使用pd.merge时遇到问题。我有以下数据

private async void Load()
{
    var settings = await UIHandler.LoadSettingsAsync();

    bapi_client_ID.Text = settings.Dienst[0].ApiKey[0].Key;
    bapi_Client_Secret.Text = settings.Dienst[0].ApiKey[1].Key;
}

private async void Save_settings_Click(object sender, RoutedEventArgs e)
{
    statusBar.Text = "Save settings...";

    var settings = ConvertAPIJson();
    await UIHandler.SaveSettingsAsync(settings);

    statusBar.Text = "Settings saved!";
}

DF1 DF2

我希望为每个客户重复所有日期,如下所示:

from pandas import DataFrame
clients = {'DATE': [20150430,20150531,20150630,20150331,20150430],
'CLIENT_ID': [1,1,1,2,2],
'VALUE' : [145,202,150,175,180]}
dates = {'DATE' : [20150331,20150430,20150531,20150630,20150731]}
df1 = DataFrame(clients,columns= ['DATE', 'CLIENT_ID','VALUE'])
df2 = DataFrame(dates,columns=['DATE'])

DF_RESULT

我尝试过,但是结果不是想要的

results = {'DATE': [20150331,20150430,20150531,20150630,20150731,20150331,20150430,20150531,20150630,20150731],
'CLIENT_ID': [1,1,1,1,1,2,2,2,2,2],
'VALUE': [None,145,202,150,None,175,180,None,None,None]}
df_results = DataFrame(results,columns= ['DATE', 'CLIENT_ID','VALUE'])

MERGE

谢谢您的帮助。

3 个答案:

答案 0 :(得分:2)

我不确定为什么要为df1使用它,可以从df2创建它,这是方法reindex

df1.groupby('CLIENT_ID').apply(lambda  x : x.set_index('DATE').reindex(df2.DATE).ffill().bfill()).reset_index(level=1)
               DATE  CLIENT_ID
CLIENT_ID                     
1          20150331        1.0
1          20150430        1.0
1          20150531        1.0
1          20150630        1.0
1          20150731        1.0
2          20150331        2.0
2          20150430        2.0
2          20150531        2.0
2          20150630        2.0
2          20150731        2.0

如果我们是从df2创建的

pd.DataFrame({'ID':df1.CLIENT_ID.unique()}).assign(key=1).merge(df2.assign(key=1))
   ID  key      DATE
0   1    1  20150331
1   1    1  20150430
2   1    1  20150531
3   1    1  20150630
4   1    1  20150731
5   2    1  20150331
6   2    1  20150430
7   2    1  20150531
8   2    1  20150630
9   2    1  20150731

答案 1 :(得分:1)

可以从产品构成基础,然后left合并其他信息。

from itertools import product
import pandas as pd

(pd.DataFrame(product(df1.CLIENT_ID.unique(), df2.DATE),
              columns=['CLIENT_ID', 'DATE'])
   .merge(df1, how='left'))

   CLIENT_ID      DATE  VALUE
0          1  20150331    NaN
1          1  20150430  145.0
2          1  20150531  202.0
3          1  20150630  150.0
4          1  20150731    NaN
5          2  20150331  175.0
6          2  20150430  180.0
7          2  20150531    NaN
8          2  20150630    NaN
9          2  20150731    NaN

如果要关注产品步骤的性能,this answer会很有帮助


或者用set_index + reindex

idx = pd.MultiIndex.from_product([df1.CLIENT_ID.unique(), df2.DATE],
                                 names=['CLIENT_ID', 'DATE'])
df1.set_index(['CLIENT_ID', 'DATE']).reindex(idx).reset_index()

答案 2 :(得分:0)

这似乎是您想要的:

 import pandas as pd
 import numpy as np
 clients = {'DATE': [20150430,20150531,20150630,20150331,20150430],
 'CLIENT_ID': [1,1,1,2,2],
 'VALUE' : [145,202,150,175,180]}
 dates = {'DATE' : [20150331,20150430,20150531,20150630,20150731]}

 df1 = pd.DataFrame(clients,columns= ['DATE', 'CLIENT_ID','VALUE'])
 df2 = df1.copy()
 df2['CLIENT_ID'].map({1:2,2:1})
 df2['VALUE']=np.NaN
 df_result=df1.append(df2).reset_index()

    DATE    CLIENT_ID   VALUE
   0    20150430    1   145.0
   1    20150531    1   202.0
   2    20150630    1   150.0
   3    20150331    2   175.0
   4    20150430    2   180.0
   5    20150430    1   NaN
   6    20150531    1   NaN
   7    20150630    1   NaN
   8    20150331    2   NaN
   9    20150430    2   NaN

每个DATE和CLIENT_ID的唯一行