假设我有一个像这样的大型数据框:
A B C
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
27/6/2017 16:20:00 662.45 1.52
27/6/2017 16:40:00 639.98 1.48
27/6/2017 17:40:00 732.02 1.79
27/6/2017 18:00:00 722.63 1.98
27/6/2017 18:20:00 713.26 1.79
27/6/2017 18:40:00 705.8 1.54
27/6/2017 19:00:00 652.1 1.51
27/6/2017 19:20:00 638.58 1.68
27/6/2017 19:40:00 633.14 1.66
27/6/2017 20:00:00 654.66 1.45
我想根据小时数的差异拆分数据帧,即如果两个时间戳之间的差异超过4小时,它将拆分数据帧。然后我想根据B的值范围在子组中分割这两个数据帧。我想将所有这些组和子组存储在单独的csv文件中。
期望的输出:
组别1:
A B C
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
组2:
A B C
27/6/2017 16:20:00 662.45 1.52
27/6/2017 16:40:00 639.98 1.48
27/6/2017 17:40:00 732.02 1.79
27/6/2017 18:00:00 722.63 1.98
27/6/2017 18:20:00 713.26 1.79
27/6/2017 18:40:00 705.8 1.54
27/6/2017 19:00:00 652.1 1.51
27/6/2017 19:20:00 638.58 1.68
27/6/2017 19:40:00 633.14 1.66
27/6/2017 20:00:00 654.66 1.45
区:
Group1 Zone1:
A B C
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
GRoup1 ZOne2:
A B C
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
喜欢这个。
我尝试了一些逻辑来实现这一点,但我无法做到这一点。
代码:
time_diff = df["Time"].diff()
zones = []
dfs = DataFrame
zone = (dfs["Time"] >= (dfs["Time"].shift() + time_diff[1]*12)).cumsum()
zone_grp = dfs.groupby(zone)
xyz = []
for k,g in zone_grp:
if len(g) >= 30:
zones.append(g)
else:
pass
for m in range(len(zones)):
zone_df = DataFrame(zones[m])
x = range(len(zone_df))
y = zone_df["T401FN1VT4000"]
abc = Series((linregress(x,y)))
abc = DataFrame(abc).T
slope = abc[0].tolist()
intercept = abc[1].tolist()
abc = DataFrame({"Slope":slope,"Intercept":intercept})
xyz.append(abc)
zone_df.to_csv("Zone_%s.csv" %m, index = False)
xyz = concat(xyz).reset_index()
del xyz["index"]
xyz["Zone"] = xyz.index
xyz = xyz.set_index("Zone")
xyz.to_csv("Coefficients.csv", index = True)
请帮助我以更好的方式根据时差拆分数据框,并帮助我将组和子组存储在具有不同名称的csv文件中。
任何帮助都将不胜感激。
答案 0 :(得分:1)
您可以将diff
和pd.Timedelta
用于第一级组,并df.B // x * x
将B
划分为范围组。
grps = [(df.A.diff() > pd.Timedelta(hours=4)).cumsum(), df.B // 100 * 100]
for i, g in df.groupby(grps):
g.to_csv('{}_{}.csv'.format(*i))
print(g)
A B C
3 2017-06-27 05:00:00 898.74 3.81
4 2017-06-27 05:20:00 895.16 3.55
5 2017-06-27 05:40:00 895.05 3.40
6 2017-06-27 06:00:00 895.68 3.30
A B C
0 2017-06-27 04:00:00 928.04 4.83
1 2017-06-27 04:20:00 927.71 4.61
2 2017-06-27 04:40:00 928.22 4.49
A B C
7 2017-06-27 16:20:00 662.45 1.52
8 2017-06-27 16:40:00 639.98 1.48
13 2017-06-27 19:00:00 652.10 1.51
14 2017-06-27 19:20:00 638.58 1.68
15 2017-06-27 19:40:00 633.14 1.66
16 2017-06-27 20:00:00 654.66 1.45
A B C
9 2017-06-27 17:40:00 732.02 1.79
10 2017-06-27 18:00:00 722.63 1.98
11 2017-06-27 18:20:00 713.26 1.79
12 2017-06-27 18:40:00 705.80 1.54