Question

在分号上拆分数据后，如何根据第一列按数据框分组？在这个例子中，我需要在最后一列时间和小时分组。

from StringIO import StringIO

myst="""india, 905034 , 19:44   
USA, 905094  , 19:33
Russia,  905154 ,   21:56

"""
u_cols=['country', 'index', 'current_tm']

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep=',', names = u_cols)

此查询未返回预期结果：

df[df['index'] > 900000].groupby([df.current_tm]).size()

current_tm
   21:56     1
 19:33       1
 19:44       1
dtype: int64

应该是：

21 1
19 2

时间是hh：mm格式，但是pandas将其视为字符串。

是否有任何实用程序将SQL查询转换为等效的pandas？（像querymongo.com这样可以帮助mongoDB用户）

Answer 1

创建一个新列：

df['hour'] = [current_time.split(':')[0] for current_time in df['current_tm']]

然后应用您的方法：

df[df['index'] > 900000].groupby([df['hour']]).size()

hour
19    2
21    1
dtype: int64

Answer 2

您可以按如下方式将小时添加到数据框中，然后将其用于分组：

df['hour'] = df.current_tm.str.strip().apply(lambda x: x.split(':')[0] if isinstance(x, str) 
                                                       else None)

>>> df[df['index'] > 900000].groupby('hour').size()
hour
19      2
21      1
dtype: int64

按功能分组

2 个答案: