也许groupby是错误的做法。似乎应该可以工作,但我没有看到它......
我想根据结果对事件进行分组。这是我的DataFrame(df):
Event SUCCESS FAILED
Run 2 1
Walk 0 1
这是我想要的结果:
grouped = df['Status'].groupby(df['Event'])
我正在尝试制作一个分组对象,但我无法弄清楚如何调用它来显示我想要的内容。
var Person = mongoose.model('Person', yourSchema);
// find each person with a last name matching 'Ghost', selecting the `name` and `occupation` fields
Person.findOne({ 'name.last': 'Ghost' }, 'name occupation', function (err, person) {
if (err) return handleError(err);
console.log('%s %s is a %s.', person.name.first, person.name.last, person.occupation) // Space Ghost is a talk show host.
})
答案 0 :(得分:7)
试试这个:
pd.crosstab(df.Event, df.Status)
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
len("df.groupby('Event').Status.value_counts().unstack().fillna(0)")
61
len("df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)")
74
len("pd.crosstab(df.Event, df.Status)")
32
答案 1 :(得分:7)
另一种解决方案,使用pivot_table()方法:
In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)
Out[5]:
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
针对700K DF的时间:
In [74]: df.shape
Out[74]: (700000, 2)
In [75]: # (c) Merlin
In [76]: %%timeit
....: pd.crosstab(df.Event, df.Status)
....:
1 loop, best of 3: 333 ms per loop
In [77]: # (c) piRSquared
In [78]: %%timeit
....: df.groupby('Event').Status.value_counts().unstack().fillna(0)
....:
1 loop, best of 3: 325 ms per loop
In [79]: # (c) MaxU
In [80]: %%timeit
....: df.pivot_table(index='Event', columns='Status',
....: aggfunc=len, fill_value=0)
....:
1 loop, best of 3: 367 ms per loop
In [81]: # (c) ayhan
In [82]: %%timeit
....: (df.assign(ones = np.ones(len(df)))
....: .pivot_table(index='Event', columns='Status',
....: aggfunc=np.sum, values = 'ones')
....: )
....:
1 loop, best of 3: 264 ms per loop
In [83]: # (c) Divakar
In [84]: %%timeit
....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)
....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)
....: # Get linear indices/tags corresponding to grouped headers
....: tag = ID1*(ID2.max()+1) + ID2
....: # Setup 2D Numpy array equivalent of expected Dataframe
....: out = np.zeros((len(unq1),len(unq2)),dtype=int)
....: unqID, count = np.unique(tag,return_counts=True)
....: np.put(out,unqID,count)
....: # Finally convert to Dataframe
....: df_out = pd.DataFrame(out,columns=unq2)
....: df_out.index = unq1
....:
1 loop, best of 3: 2.25 s per loop
结论:@ ayhan的解决方案目前胜出:
(df.assign(ones = np.ones(len(df)))
.pivot_table(index='Event', columns='Status', values = 'ones',
aggfunc=np.sum, fill_value=0)
)
答案 2 :(得分:4)
答案 3 :(得分:4)
这是一种基于NumPy的方法 -
# Get unique header strings for input dataframes
unq1,ID1 = np.unique(df['Event'],return_inverse=True)
unq2,ID2 = np.unique(df['Status'],return_inverse=True)
# Get linear indices/tags corresponding to grouped headers
tag = ID1*(ID2.max()+1) + ID2
# Setup 2D Numpy array equivalent of expected Dataframe
out = np.zeros((len(unq1),len(unq2)),dtype=int)
unqID, count = np.unique(tag,return_counts=True)
np.put(out,unqID,count)
# Finally convert to Dataframe
df_out = pd.DataFrame(out,columns=unq2)
df_out.index = unq1
示例输入,输出更通用的情况 -
In [179]: df
Out[179]:
Event Status
0 Sit PASS
1 Run SUCCESS
2 Walk SUCCESS
3 Run PASS
4 Run SUCCESS
5 Walk FAILED
6 Walk PASS
In [180]: df_out
Out[180]:
FAILED PASS SUCCESS
Run 0 1 2
Sit 0 1 0
Walk 1 1 1