我有一个数据集,该数据集包含随时间测量的五个信号。在给定的数据文件中,每个时间戳对应一个唯一的测量位置。这些位置在每个文件中重复,但是时间间隔是不规则的。我想计算信号在每个位置随时间的线性回归。
现在,我已经将每个数据文件导入为Pandas数据框,然后将它们组装为3-d数据框,如下所示:
peak1 = pd.read_csv('peak/scan1.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak2 = pd.read_csv('peak/scan2.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak3 = pd.read_csv('peak/scan3.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak4 = pd.read_csv('peak/scan4.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak5 = pd.read_csv('peak/scan5.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak6 = pd.read_csv('peak/scan6.txt', skiprows=[i for i in [0, 2]], index_col=False)
peaks = pd.concat([peak1, peak2, peak3, peak4, peak5, peak6], keys=('Scan1', 'Scan2', 'Scan3', 'Scan4', 'Scan5', 'Scan6'))
peaks['Start'] = pd.to_datetime(peaks['Start'], format='%H:%M:%S.%f')
peaks['End'] = pd.to_datetime(peaks['End'], format='%H:%M:%S.%f')
是否有一种简单的方法来创建一个包含每个测量位置相对于开始时间的回归斜率的数组?我可以通过从每个文件中拉出每个位置并为每个信号创建一堆信号与时间的二维数组,计算回归,然后重新组装为新的数据框并将其粘贴到原始数据框上来完成此操作,但是看起来像可能有一种更有效的方法。
编辑:示例
Scan1
IDTag,Index,Position,LastInteg,Start,End,Sig1,Sig2,Sig3,Sig4,Sig5,Sig2B,Sig3B,Sig4B,Sig5B
1,1,37.8450,False,02:13:59.893,02:14:00.106, 5, -0.0000183, 0.0000225, -0.0000168, 0.0000605, -0.0000183, 0.0000225, -0.0000168, 0.0000605,
2,1,37.8448,False,02:14:00.174,02:14:00.387, 0, -0.0000124, 0.0000081, 0.0000095, -0.0000370, -0.0000124, 0.0000081, 0.0000095, -0.0000370,
3,1,37.8446,False,02:14:00.439,02:14:00.652, 0, -0.0000079, 0.0000163, 0.0000214, -0.0000670, -0.0000079, 0.0000163, 0.0000214, -0.0000670,
4,1,37.8444,False,02:14:00.704,02:14:00.918, 0, -0.0000313, -0.0000238, 0.0000211, 0.0000086, -0.0000313, -0.0000238, 0.0000211, 0.0000086,
5,1,37.8442,False,02:14:00.969,02:14:01.182, 0, 0.0000376, -0.0000149, -0.0000246, -0.0000273, 0.0000376, -0.0000149, -0.0000246, -0.0000273,
6,1,37.8440,False,02:14:01.234,02:14:01.448, 0, -0.0000171, 0.0000318, -0.0000517, -0.0000144, -0.0000171, 0.0000318, -0.0000517, -0.0000144,
7,1,37.8438,False,02:14:01.500,02:14:01.713, 0, 0.0000494, -0.0000132, 0.0000169, 0.0000398, 0.0000494, -0.0000132, 0.0000169, 0.0000398,
8,1,37.8436,False,02:14:01.765,02:14:01.978, 0, -0.0000162, 0.0000721, 0.0000450, -0.0000324, -0.0000162, 0.0000721, 0.0000450, -0.0000324,
9,1,37.8434,False,02:14:02.030,02:14:02.242, 0, 0.0000210, 0.0000141, -0.0000450, -0.0000436, 0.0000210, 0.0000141, -0.0000450, -0.0000436,
10,1,37.8432,False,02:14:02.295,02:14:02.508, 0, -0.0000420, -0.0000070, -0.0000197, -0.0000195, -0.0000420, -0.0000070, -0.0000197, -0.0000195,
Scan2
IDTag,Index,Position,LastInteg,Start,End,Sig1,Sig2,Sig3,Sig4,Sig5,Sig2B,Sig3B,Sig4B,Sig5B
1,1,37.6950,False,02:19:25.980,02:19:26.192, 0, -0.0000127, 0.0000533, -0.0000101, -0.0000177, -0.0000127, 0.0000533, -0.0000101, -0.0000177,
2,1,37.6952,False,02:19:26.245,02:19:26.460, 0, -0.0000500, -0.0000029, 0.0000109, -0.0000493, -0.0000500, -0.0000029, 0.0000109, -0.0000493,
3,1,37.6954,False,02:19:26.511,02:19:26.723, 0, -0.0000545, -0.0000235, -0.0000488, 0.0000353, -0.0000545, -0.0000235, -0.0000488, 0.0000353,
4,1,37.6956,False,02:19:26.776,02:19:26.989, 0, 0.0000221, -0.0000147, 0.0000139, 0.0000607, 0.0000221, -0.0000147, 0.0000139, 0.0000607,
5,1,37.6958,False,02:19:27.041,02:19:27.254, 5, 0.0000016, -0.0000153, -0.0000305, 0.0000076, 0.0000016, -0.0000153, -0.0000305, 0.0000076,
6,1,37.6960,False,02:19:27.306,02:19:27.518, 0, 0.0000076, 0.0000069, 0.0000244, 0.0000302, 0.0000076, 0.0000069, 0.0000244, 0.0000302,
7,1,37.6962,False,02:19:27.571,02:19:27.784, 5, 0.0000141, 0.0000519, 0.0000095, -0.0000292, 0.0000141, 0.0000519, 0.0000095, -0.0000292,
8,1,37.6964,False,02:19:27.837,02:19:28.051, 0, -0.0000167, -0.0000878, -0.0000292, 0.0000934, -0.0000167, -0.0000878, -0.0000292, 0.0000934,
9,1,37.6966,False,02:19:28.102,02:19:28.316, 0, 0.0000353, 0.0000206, 0.0000289, -0.0000510, 0.0000353, 0.0000206, 0.0000289, -0.0000510,
10,1,37.6968,False,02:19:28.367,02:19:28.581, 5, 0.0000103, 0.0000374, -0.0000351, -0.0000124, 0.0000103, 0.0000374, -0.0000351, -0.0000124,
Scan3
IDTag,Index,Position,LastInteg,Start,End,Sig1,Sig2,Sig3,Sig4,Sig5,Sig2B,Sig3B,Sig4B,Sig5B
1,1,37.8450,False,02:23:06.767,02:23:06.979, 5, -0.0000075, 0.0000574, -0.0000014, 0.0000523, -0.0000075, 0.0000574, -0.0000014, 0.0000523,
2,1,37.8448,False,02:23:07.048,02:23:07.261, 0, -0.0000019, 0.0000010, -0.0000090, -0.0000107, -0.0000019, 0.0000010, -0.0000090, -0.0000107,
3,1,37.8446,False,02:23:07.313,02:23:07.526, 5, 0.0000316, 0.0000154, 0.0000086, -0.0000582, 0.0000316, 0.0000154, 0.0000086, -0.0000582,
4,1,37.8444,False,02:23:07.579,02:23:07.791, 5, -0.0000320, 0.0000014, -0.0000194, 0.0000081, -0.0000320, 0.0000014, -0.0000194, 0.0000081,
5,1,37.8442,False,02:23:07.844,02:23:08.057, 0, 0.0000227, -0.0000326, 0.0000124, -0.0000078, 0.0000227, -0.0000326, 0.0000124, -0.0000078,
6,1,37.8440,False,02:23:08.109,02:23:08.321, 0, -0.0000037, -0.0000201, -0.0000247, -0.0000361, -0.0000037, -0.0000201, -0.0000247, -0.0000361,
7,1,37.8438,False,02:23:08.374,02:23:08.587, 10, 0.0000048, -0.0000790, -0.0000260, 0.0000352, 0.0000048, -0.0000790, -0.0000260, 0.0000352,
8,1,37.8436,False,02:23:08.639,02:23:08.853, 0, 0.0000499, 0.0000047, -0.0000064, -0.0000554, 0.0000499, 0.0000047, -0.0000064, -0.0000554,
9,1,37.8434,False,02:23:08.905,02:23:09.117, 0, -0.0000475, -0.0000130, -0.0000116, 0.0000996, -0.0000475, -0.0000130, -0.0000116, 0.0000996,
10,1,37.8432,False,02:23:09.170,02:23:09.384, 0, 0.0000206, -0.0000171, 0.0000280, 0.0000349, 0.0000206, -0.0000171, 0.0000280, 0.0000349,
编辑2:以下内容可以满足我的要求,但似乎可以利用数据帧结构更轻松地完成此操作。
unique_positions = peaks['Position'].unique()
signal_list = ['Sig1', 'Sig2', 'Sig3', 'Sig4', 'Sig5', 'Sig2B', 'Sig3B', 'Sig4B', 'Sig5B']
regs = pd.DataFrame(columns=['Position'] + signal_list)
regs.set_index('Position', inplace=True)
for pos in unique_positions:
time_series_at_pos = peaks[peaks['Position'] == pos]
for sig in signal_list:
linear_regressor = LinearRegression()
linear_regressor.fit(time_series_at_pos['Start'].values.reshape(-1, 1), time_series_at_pos[sig].values.reshape(-1, 1))
regs.ix[pos, sig] = linear_regressor.coef_[0][0]