如何加快熊猫的“速度”

时间:2019-05-13 16:18:08

标签: pandas

我有一个熊猫数据框,我想通过以下方式进行转换:我有一个智能地板上的一些传感器数据,该数据位于“ CAPACITANCE”列中(由“,”分隔),并且该数据来自于图中指示的设备列“ DEVICE”。现在,我想每个传感器有一行,每列有一列-每个设备有8个传感器,所以我想有x 8列的设备,在那一列中,我想从该传感器中获取传感器数据。

但是我的代码似乎超级慢,因为该数据帧中大约有90.000行!有人建议如何加快速度吗?

之前:

                                    CAPACITANCE DEVICE            TIMESTAMP  \
0   0.00,-1.00,0.00,1.00,1.00,-2.00,13.00,1.00   01,07  2017/11/15 12:24:42   
1  0.00,0.00,-1.00,-1.00,-1.00,0.00,-1.00,0.00   01,07  2017/11/15 12:24:42   
2   0.00,-1.00,-2.00,0.00,0.00,1.00,0.00,-2.00   01,07  2017/11/15 12:24:43   
3   2.00,0.00,-2.00,-1.00,0.00,0.00,1.00,-2.00   01,07  2017/11/15 12:24:43   
4    1.00,0.00,-2.00,1.00,1.00,-3.00,5.00,1.00   01,07  2017/11/15 12:24:44   

之后:

   01,01-0  01,01-1  01,01-2  01,01-3  01,01-4  01,01-5  01,01-6  01,01-7  \
0        0        0        0        0        0        0        0        0   
1        0        0        0        0        0        0        0        0   
2        0        0        0        0        0        0        0        0   
3        0        0        0        0        0        0        0        0   
4        0        0        0        0        0        0        0        0   

   01,02-0  01,02-1  ...  05,07-1  05,07-2  05,07-3  05,07-4  05,07-5  \
0        0        0  ...        0        0        0        0        0   
1        0        0  ...        0        0        0        0        0   
2        0        0  ...        0        0        0        0        0   
3        0        0  ...        0        0        0        0        0   
4        0        0  ...        0        0        0        0        0   

   05,07-6  05,07-7           TIMESTAMP    01,07-8  
0        0        0 2017-11-15 12:24:42       1.00   
1        0        0 2017-11-15 12:24:42       0.00   
2        0        0 2017-11-15 12:24:43      -2.00   
3        0        0 2017-11-15 12:24:43      -2.00   
4        0        0 2017-11-15 12:24:44       1.00   
# creating new dataframe based on the old one
floor_df_resampled = floor_df.copy()
floor_device = ["01,01", "01,02", "01,03", "01,04", "01,05", "01,06", "01,07", "01,08", "01,09", "01,10", 
               "02,01", "02,02", "02,03", "02,04", "02,05", "02,06", "02,07", "02,08", "02,09", "02,10", 
                "03,01", "03,02", "03,03", "03,04", "03,05", "03,06", "03,07", "03,08", "03,09", 
               "04,01", "04,02", "04,03", "04,04", "04,05", "04,06", "04,07", "04,08", "04,09",
               "05,06", "05,07"]

# creating new columns
floor_objects = []
for device in floor_device:
   for sensor in range(8):
       floor_objects.append(device + "-" + str(sensor))

# merging new columns 
floor_df_resampled = pd.concat([floor_df_resampled, pd.DataFrame(columns=floor_objects)], ignore_index=True, sort=True)

# part that takes loads of time
for index, row in floor_df_resampled.iterrows():
   obj = row["DEVICE"]
   sensor_data = row["CAPACITANCE"].split(',')
   for idx, val in enumerate(sensor_data):
       col = obj + "-" + str(idx + 1)
       floor_df_resampled.loc[index, col] = val

floor_df_resampled.drop(["DEVICE"], axis=1, inplace=True)
floor_df_resampled.drop(["CAPACITANCE"], axis=1, inplace=True)

1 个答案:

答案 0 :(得分:0)

就像评论一样,我不确定为什么要那么多列,但是可以按如下方式创建新列:

def explode(x):
    dev_name = x.DEVICE.iloc[0]
    ret_df = x.CAPACITANCE.str.split(',', expand=True).astype(float)
    ret_df.columns = [f'{dev_name}-{col}' for col in ret_df.columns]
    return ret_df

new_df = df.groupby('DEVICE').apply(explode).fillna(0)

然后可以将其与旧数据框合并:

df = df.join(new_df)