熊猫:通过一些数据分组

时间:2017-03-29 13:41:36

标签: python pandas

我有数据框

datetime    city    state   country shape   duration (seconds)  duration (hours/min)    comments    date posted latitude    longitude
10/10/1949 20:30    san marcos  tx  us  cylinder    2700    45 minutes  This event took place in early fall around 1949-50. It occurred after a Boy Scout meeting in the Baptist Church. The Baptist Church sit 4/27/2004   29.8830556  -97.9411111
10/10/1949 21:00    lackland afb    tx      light   7200    1-2 hrs 1949 Lackland AFB&#44 TX. Lights racing across the sky & making 90 degree turns on a dime.  12/16/2005  29.38421    -98.581082
10/10/1955 17:00    chester (uk/england)        gb  circle  20  20 seconds  Green/Orange circular disc over Chester&#44 England 1/21/2008   53.2    -2.916667
10/10/1956 21:00    edna    tx  us  circle  20  1/2 hour    My older brother and twin sister were leaving the only Edna theater at about 9 PM&#44...we had our bikes and I took a different route home  1/17/2004   28.9783333  -96.6458333
10/10/1960 20:00    kaneohe hi  us  light   900 15 minutes  AS a Marine 1st Lt. flying an FJ4B fighter/attack aircraft on a solo night exercise&#44 I was at 50&#44000&#39 in a "clean" aircraft (no ordinan  1/22/2004   21.4180556  -157.8036111

我尝试按state分组 我用

result = df.groupby("state").\
    agg({"state": pd.Series.nunique, "duration (seconds)": np.sum}).\
    rename(columns={"state": "frequency", "duration (seconds)": "whole time"}).\
    reset_index()

但它返回错误TypeError: must be str, not float。 我尝试转换duration (seconds)但它返回 duration (seconds)。 我该如何检查这个问题?

1 个答案:

答案 0 :(得分:0)

做类似的事情:

# Group df by df.state, then apply a sum lambda function to df.duration(seconds)
df.groupby('state')['duration (seconds)'].apply(lambda x:x.mean())

或者如果你想要滚动总和:

df.groupby('state')['duration (seconds)'].apply(lambda x:x.rolling(center=False,window=2).sum())