在pandas.Series上调用group by和gregation时,它比在带有一列的pandas.DataFrame上调用快(这是pandas.Series实例)。下面是一个示例:
import random
import pandas as pd
import time
column1 = [random.randint(1,3) for i in range(1000)]
column2 = [random.random() for i in range(1000)]
df = pd.DataFrame(zip(column1, column2), columns=["group", "number"])
t1 = time.time()
grouped_1 = df.groupby("group").sum()
t2 = time.time()
print(t2-t1)
t1 = time.time()
grouped_2 = df.groupby("group")["number"].sum()
t2 = time.time()
print(t2-t1)
print("First type %s" %type(grouped_1))
print("Second type %s" %type(grouped_2))
输出=
0.0062596797943115234
0.0024614334106445312
First type <class 'pandas.core.frame.DataFrame'>
Second type <class 'pandas.core.series.Series'>
将数据框与一列或一列进行聚合的瓶颈是什么?