转动pandas数据框,其中包含日期并显示每个日期的计数

时间:2016-07-12 21:39:09

标签: python numpy pandas

我有以下pandas DataFrame :(目前约有500行):


merged_verified = 

    Last Verified    Verified by
0   2016-07-11       John Doe 
1   2016-07-11       John Doe
2   2016-07-12       John Doe
3   2016-07-11       Mary Smith
4   2016-07-12       Mary Smith

我正试图pivot_table()收到以下内容:

Last Verified   2016-07-11   2016-07-12
Verified by
John Doe                 2            1
Mary Smith               1            1

目前我正在运行

merged_verified = merged_verified.pivot_table(index=['Verified by'], values=['Last Verified'], aggfunc='count')

让我接近我需要的东西,但不完全是:

             Last Verified
Verified by
John Doe                 3
Mary Smith               2

我已尝试使用参数进行各种操作,但这些都没有奏效。上面的结果是我最接近我需要的结果。我在某处读到了我需要添加一个使用虚拟值(1' s)的附加列,然后我可以添加它,但这似乎与我认为简单的DataFrame布局相反。

2 个答案:

答案 0 :(得分:3)

您可以按columns添加参数len和aggragate:

merged_verified = merged_verified.pivot_table(index=['Verified by'], 
                                              columns=['Last Verified'], 
                                              values=['Last Verified'], 
                                              aggfunc=len)
print (merged_verified)
Last         2016-07-11  2016-07-12
Verified by                        
Doe                   2           1
Smith                 1           1

或者您也省略了values

merged_verified = merged_verified.pivot_table(index=['Verified by'], 
                                              columns=['Last Verified'], 
                                              aggfunc=len)
print (merged_verified)
Last Verified  2016-07-11  2016-07-12
Verified by                          
John Doe                2           1
Mary Smith              1           1

答案 1 :(得分:1)

使用groupbyvalue_countsunstack

merged_verified.groupby('Last Verified')['Verified by'].value_counts().unstack(0)

enter image description here

时序

示例数据框

enter image description here

大型数据帧100万行

idx = pd.MultiIndex.from_product(
    [
        pd.date_range('2016-03-01', periods=100),
        pd.DataFrame(np.random.choice(letters, (10000, 10))).sum(1)
    ], names=['Last Verified', 'Verified by'])

merged_verified = idx.to_series().reset_index()[idx.names]

enter image description here