我怎样才能以聪明的方式将我的数据分类到不同的箱子中

时间:2018-01-22 14:13:28

标签: python pandas bins

您好我正在使用pandas从两个excel文件导入数据,其中一个文件中包含的数据示例如下所示。基本上我试图找到两个文件中相同的时间戳,然后排序例如“Power”列中的数据,该列对应于从两个文件到一些箱中的相同时间戳。该例子中的箱子是0-50,50-100,依此类推,间隔为50,例如, 1000

1.  Location    UnitName    Timestamp           Power        Windspeed   Yaw
2.  Bull Creek  F10         01/11/2014 00:00:00 7,563641548  3,957911002 280,5478821     
3.  Bull Creek  F10         01/11/2014 00:20:00 60,73444748  4,24157236  280,4075012
4.  Bull Creek  F10         01/11/2014 00:30:00 63,15441132  4,241089859 280,3903809
5.  Bull Creek  F10         01/11/2014 00:40:00 59,09280396  4,38904965  280,4152527
6.  Bull Creek  F10         01/11/2014 00:50:00 69,26197052  4,374599175 280,3750916
7.  Bull Creek  F10         01/11/2014 01:00:00 101,0624237  5,343887005 280,5173035
8.  Bull Creek  F10         01/11/2014 01:10:00 122,7936935  5,183885235 280,4681702
9.  Bull Creek  F10         01/11/2014 01:20:00 86,57110596  5,046733923 280,3834534     
10. Bull Creek  F10         01/11/2014 01:40:00 16,74042702  3,024427626 280,1408386
11. Bull Creek  F10         01/11/2014 01:50:00 12,5870142   2,931351769 280,1185913
12. Bull Creek  F10         01/11/2014 02:00:00 -1,029753685 3,116549245 279,9686279
13. Bull Creek  F10         01/11/2014 02:10:00 13,35998058  3,448055706 279,8687134
14. Bull Creek  F10         01/11/2014 02:20:00 17,42461395  2,943588415 280,1383057
15. Bull Creek  F10         01/11/2014 02:30:00 -9,614940643 2,744164819 280,6514893   
16. Bull Creek  F10         01/11/2014 02:50:00 -11,01966286 3,554833538 283,1451416
17. Bull Creek  F10         01/11/2014 03:00:00 -4,383010387 4,279259377 283,3281555

我想知道是否有更聪明的方法来做到这一点,而不是我迄今为止所做的,因为箱子的大小和最大值可能会改变。但这是我的代码,它有效,但不是很聪明。

import pandas as pd

fileREF = 'FilterDataREF.xlsx'

dataREF = pd.read_excel(fileREF, sheetname='Sheet1')

filePCU = 'FilterDataPCU.xlsx'

dataPCU = pd.read_excel(filePCU, sheetname='Ark1')

dateREF = dataREF['Timestamp']
datePCU = dataPCU['Timestamp']


n = 50
PowerLim = 1500
nBins = PowerLim/n
bins = range(0, PowerLim+1, n)

for i in range(len(dataREF)):
    for j in range(len(dataPCU)):
        if dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] and 
        dataREF['Power'][i] > 0 and dataPCU['Power'][j] > 0:
         data_common = [dataREF.loc[i], dataPCU.loc[j]]

         data_power = [data_common[0][3], data_common[1][3]]
         power_dif = data_common[1][3]-data_common[0][3]

         power_REF = data_power[:][0]
         power_PCU = data_power[:][1]

         bin1 = power_REF[power_REF < 50]
         bin2 = power_REF[power_REF > 50 and power_REF < 100]
         bin3 = power_REF[power_REF > 100 and power_REF < 150]

1 个答案:

答案 0 :(得分:0)

您可以使用.cut功能:

data_common['bin'] = pd.cut(data_common['power_REF'],bins=(0,max(data_common['power_REF'])+50,50),labels=range(0,max(data_common['powerREF'])+50,50))