first_df包含此代码中的所有原始数据。

Question

我必须获取每个CUS_ID的DAY，HOUR和Type的频率。在我的代码中，我只获得了最后CUS_ID的频率。我不知道如何获得全部。我已经尝试过pd.append（ignore_index = True），但它使我的df混乱了。

此图像是编译结果。 enter image description here 有70个CUS_ID，最后一个是2449。

first_df包含此代码中的所有原始数据。

DayFreq = first_df.groupby(['CUS_ID', 'DAY']).size()
HourFreq = first_df.groupby(['CUS_ID', 'TIME_HOUR']).size()
TypeFreq = first_df.groupby(['CUS_ID', 'ACT_NM']).size()

allCUS = first_df.groupby('CUS_ID').size() 
df_con = pd.DataFrame()
idx = 0

for idx in allCUS.index:
       df_con = pd.concat([DayFreq.loc[idx, :], HourFreq.loc[idx, :], TypeFreq.loc[idx, :]], axis = 0, join = 'outer') 
       idx = idx + 1

我想得到的是

CUS_ID DAY
2      FRI      925
        .
        .
        .
CUS_ID FRI      599
2449    .
        .

赞！

要获得此结果，我应该如何更改此编码？

Answer 1

为什么不只是sort_index

pd.concat([DayFreq ,HourFreq ,TypeFreq],keys=[0,1,2]).sort_index(level=0)

如何通过for循环将多索引系列附加到数据框

first_df包含此代码中的所有原始数据。

1 个答案: