我是否可以根据数据开头的值拆分数据帧?
我有一个数据框和一列时间。我想将它们分组。因此,我希望一个数据帧的时间在12-3之间,另一个在3-6之间,以此类推。等等。有什么办法可以做到?
我尝试使用.groupby()
并输入值时遇到键盘错误。
这是我的输入:
ACC_DATE ACC_TIME DAY_OF_WEEK COUNTY_NAME INJURY COLLISION_WITH_1
978 2012-01-21 0:01 SATURDAY Harford NO FIXED OBJ
952 2012-01-21 0:01 SATURDAY Anne Arundel NO VEH
995 2012-01-21 0:01 SATURDAY Prince Georges NO VEH
1059 2012-01-22 0:01 SUNDAY Carroll YES FIXED OBJ
941 2012-01-21 0:01 SATURDAY Prince Georges NO FIXED OBJ
... ... ... ... ... ... ...
17535 2012-12-10 9:12 MONDAY Frederick NO FIXED OBJ
17536 2012-12-10 9:12 MONDAY Frederick NO FIXED OBJ
17251 2012-12-07 9:12 FRIDAY Anne Arundel NO VEH
17507 2012-12-10 9:12 MONDAY Dorchester NO FIXED OBJ
18636 2012-12-31 9:12 MONDAY Frederick YES NON-COLLISION
这是我正在使用的精炼数据
ACC_TIME COUNTY_NAME
ACC_TIME
0:08 0:08 Allegany
0:09 0:09 Allegany
0:09 0:09 Allegany
0:10 0:10 Allegany
0:10 0:10 Allegany
... ... ...
9:09 9:09 Allegany
9:10 9:10 Allegany
9:10 9:10 Allegany
9:11 9:11 Allegany
9:12 9:12 Allegany
这是我的代码
#--> First, how can I organize my data for only county & times?
sp = df.drop(['ACC_DATE','DAY_OF_WEEK','INJURY','COLLISION_WITH_1'],axis=1)
#Next, how can I organize the data by county and time of accidents?
sp_sorted = sp.sort_values(['COUNTY_NAME', 'ACC_TIME'], inplace=True)
# sp_sorted
#Now, I want to split sp by county.
sp.set_index(keys=['COUNTY_NAME','ACC_TIME'], drop=False,inplace=True)
names = sp['COUNTY_NAME'].unique().tolist()
times = sp['ACC_TIME'].unique().tolist()
allegany = sp.loc['Allegany']
allegany
# allegany.groupby(['9','10','11','12'])
我的预期输出是较小数据帧的列表。然后,我将使用该列表的条目作为潜在散点图或条形图中的x值。该图根据时间段(12-3、3-6等)测量事故数量
答案 0 :(得分:0)
我相信这就是您要寻找的。在此示例中,我将根据“ a”列在单个数据框中构建一个数据框列表。
df = pd.DataFrame({"a": [1, 1, 1, 2, 2, 2], "b": range(6), "c": range(6, 12)})
==>
a b c
0 1 0 6
1 1 1 7
2 1 2 8
3 2 3 9
4 2 4 10
5 2 5 11
现在建立数据框列表:
df_list = []
def to_list(df):
df_list.append(df.copy())
return pd.Series(range(3))
df.groupby("a", as_index = False).apply(to_list)
输出:
print(df_list[0])
# a b c
# 0 1 0 6
# 1 1 1 7
# 2 1 2 8
print(df_list[1])
# a b c
# 3 2 3 9
# 4 2 4 10
# 5 2 5 11
如果您要分组的列的类型为datetime,您也可以这样做:
dates = pd.date_range("2020-01-01 00:00", periods=15, freq = "19min")
df = pd.DataFrame({"a": dates, "b": range(len(dates)), "c": range(10, 10+len(dates))})
print(df.head())
==>
a b c
0 2020-01-01 00:00:00 0 10
1 2020-01-01 00:19:00 1 11
2 2020-01-01 00:38:00 2 12
3 2020-01-01 00:57:00 3 13
4 2020-01-01 01:16:00 4 14
df_list = []
df.groupby(df.a.dt.hour, as_index = False).apply(to_list)
print(df_list[1])
==>
a b c
4 2020-01-01 01:16:00 4 14
5 2020-01-01 01:35:00 5 15
6 2020-01-01 01:54:00 6 16
print(df_list[2])
==>
a b c
7 2020-01-01 02:13:00 7 17
8 2020-01-01 02:32:00 8 18
9 2020-01-01 02:51:00 9 19