我有一个数据框,格式为:
user accuracy latitude longitude timestamp
0 5573502c150000c10136e51b 29.942 -8.658122 -45.700106 1434127670836
1 5573502c150000c10136e51b 30.000 -8.658068 -45.700127 1434127730889
2 5573502c150000c10136e51b 30.000 -8.658068 -45.700127 1434127790911
3 5573502c150000c10136e51b 30.000 -8.658057 -45.700123 1434127858915
4 5573502c150000c10136e51b 39.000 -8.658072 -45.700108 1434127918948
5 5573502c150000c10136e51b 31.876 -8.658100 -45.700107 1434128021062
6 5573502c150000c10136e51b 30.048 -8.658116 -45.700140 1434128151467
7 5573502c150000c10136e51b 30.473 -8.658118 -45.700097 1434128277097
8 5573502c150000c10136e51b 55.500 -6.658087 -45.700138 1434140105618
9 5573502c150000c10136e51b 55.500 -6.658087 -45.700138 1434140165685
10 5573502c150000c10136e51b 30.000 -6.658057 -45.700130 1434140225898
11 5573502c150000c10136e51b 30.000 -6.658057 -45.700130 1434140285952
12 5573502c150000c10136e51b 30.000 -7.658084 -45.700113 1434140346166
13 5573502c150000c10136e51b 36.000 -7.658051 -45.700138 1434140406214
14 5573502c150000c10136e51b 36.000 -5.658051 -45.700138 1434140466240
15 5573502c150000c10136e51b 32.908 -5.658091 -45.700097 1434140526278
16 5573502c150000c10136e51b 32.908 -5.658091 -45.700097 1434140586325
17 5573502c150000c10136e51b 34.009 -5.658075 -45.700119 1434140646363
18 5573502c150000c10136e51b 30.000 -5.658058 -45.700118 1434140706409
19 5573502c150000c10136e51b 30.000 -5.658058 -45.700118 1434140766455
我想按天对数据框进行分组,然后将每天的记录追加到其他列表中。
真是的,我有
DFList = [group[1] for group in df.groupby(df.index.day)]
print DFList
但是我得到一个错误:
AttributeError:“ RangeIndex”对象没有属性“ day”
有人知道如何解决此问题吗?
答案 0 :(得分:3)
我认为您首先需要to_datetime
和unit='ms'
,然后转换为Series.dt.day
:
df['day'] = pd.to_datetime(df['timestamp'], unit='ms').dt.day
dfs = [x for i, x in df.groupby('day')]
或者如果需要DatetimeIndex
:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
df = df.set_index('timestamp')
dfs = [x for i, x in df.groupby(df.index.day)]
print (dfs)
如果需要相同格式的时间戳列:
day = pd.to_datetime(df['timestamp'], unit='ms').dt.day
dfs = [x for i, x in df.groupby(day)]