我有以下pandas DataFrame :(目前约有500行):
merged_verified = Last Verified Verified by 0 2016-07-11 John Doe 1 2016-07-11 John Doe 2 2016-07-12 John Doe 3 2016-07-11 Mary Smith 4 2016-07-12 Mary Smith
我正试图pivot_table()
收到以下内容:
Last Verified 2016-07-11 2016-07-12 Verified by John Doe 2 1 Mary Smith 1 1
目前我正在运行
merged_verified = merged_verified.pivot_table(index=['Verified by'], values=['Last Verified'], aggfunc='count')
让我接近我需要的东西,但不完全是:
Last Verified Verified by John Doe 3 Mary Smith 2
我已尝试使用参数进行各种操作,但这些都没有奏效。上面的结果是我最接近我需要的结果。我在某处读到了我需要添加一个使用虚拟值(1' s)的附加列,然后我可以添加它,但这似乎与我认为简单的DataFrame布局相反。
答案 0 :(得分:3)
您可以按columns
添加参数len
和aggragate:
merged_verified = merged_verified.pivot_table(index=['Verified by'],
columns=['Last Verified'],
values=['Last Verified'],
aggfunc=len)
print (merged_verified)
Last 2016-07-11 2016-07-12
Verified by
Doe 2 1
Smith 1 1
或者您也省略了values
:
merged_verified = merged_verified.pivot_table(index=['Verified by'],
columns=['Last Verified'],
aggfunc=len)
print (merged_verified)
Last Verified 2016-07-11 2016-07-12
Verified by
John Doe 2 1
Mary Smith 1 1
答案 1 :(得分:1)
使用groupby
,value_counts
和unstack
:
merged_verified.groupby('Last Verified')['Verified by'].value_counts().unstack(0)
示例数据框
大型数据帧100万行
idx = pd.MultiIndex.from_product(
[
pd.date_range('2016-03-01', periods=100),
pd.DataFrame(np.random.choice(letters, (10000, 10))).sum(1)
], names=['Last Verified', 'Verified by'])
merged_verified = idx.to_series().reset_index()[idx.names]