熊猫get_indexer在单个时间间隔内返回所有-1

时间:2018-11-07 01:31:50

标签: python pandas

在下面的示例中,我无法理解为什么此间隔查找不能满足我所有数据点的0索引匹配的期望:

import pandas  

dfLbl  = pandas.DataFrame( { 'Started':[554235706.051] , 'Stopped':[554240454.867] , 'Label':['LblVal'] } )
dfData = pandas.DataFrame( {'Angle': [-89.460618, -90.053987, -89.735639, -179.248331, 90.405555, 0.541808, 1.257457, 0.16111] ,
                            'time_s':[554237043.713062, 554238249.989954, 554235853.912149, 554237638.876251, 554237007.218903, 554239665.777394, 554238786.764156, 554239549.519223] })

print( "dfData\n{}".format( dfData ))
print( "\ndfLbl\n{}".format( dfLbl ))

lbl_intervals = pandas.IntervalIndex.from_arrays( dfLbl['Started'] , dfLbl['Stopped'] , closed='neither' )
lbl_indexes   = lbl_intervals.get_indexer( dfData['time_s'] )

print( "\nlbl_intervals\n{}".format( lbl_intervals ))
print( "\nlbl_indexes\n{}".format( lbl_indexes ))

print( "\n{}".format( pandas.DataFrame( { 'a <= x':  dfLbl.loc[0,'Started'] <= dfData['time_s'] , 'x < b' : dfData['time_s'] < dfLbl.loc[0,'Stopped']} )  ))
print(  "\nIntervalWidth={}".format( dfLbl['Stopped'] - dfLbl['Started'] ))

哪个提供以下控制台输出

dfData
        Angle        time_s
0  -89.460618  5.542370e+08
1  -90.053987  5.542382e+08
2  -89.735639  5.542359e+08
3 -179.248331  5.542376e+08
4   90.405555  5.542370e+08
5    0.541808  5.542397e+08
6    1.257457  5.542388e+08
7    0.161110  5.542395e+08

dfLbl
    Label       Started       Stopped
0  LblVal  5.542357e+08  5.542405e+08

lbl_intervals
IntervalIndex([(554235706.051, 554240454.867)]
              closed='neither',
              dtype='interval[float64]')

lbl_indexes
[-1 -1 -1 -1 -1 -1 -1 -1]

   a <= x   x < b
0    True    True
1    True    True
2    True    True
3    True    True
4    True    True
5    True    True
6    True    True
7    True    True

IntervalWidth=0    4748.816
dtype: float64

我完全困惑,因为当我手动进行a <= xx < b列中的不等式时,它表明time_s数据是有界的。我检查了索引器的宽度是否不是一些不合理的小数字。还有什么地方出问题了?使用单个间隔进行此类操作是否有些谬误?熊猫会将值强制转换为其他类型的东西吗?

1 个答案:

答案 0 :(得分:0)

好像是我使用的熊猫版本中的错误。我查看了GitHub What's New page,发现了一些涉及索引的错误修复。所以我升级了...

$ pip install --upgrade pandas
...
Installing collected packages: pandas
  Found existing installation: pandas 0.22.0
    Uninstalling pandas-0.22.0:
      Successfully uninstalled pandas-0.22.0
Successfully installed pandas-0.23.4

然后重新运行脚本,发现控制台输出现在可以提供预期的结果...

dfData
        Angle        time_s
0  -89.460618  5.542370e+08
1  -90.053987  5.542382e+08
2  -89.735639  5.542359e+08
3 -179.248331  5.542376e+08
4   90.405555  5.542370e+08
5    0.541808  5.542397e+08
6    1.257457  5.542388e+08
7    0.161110  5.542395e+08

dfLbl
    Label       Started       Stopped
0  LblVal  5.542357e+08  5.542405e+08

lbl_intervals
IntervalIndex([(554235706.051, 554240454.867)]
              closed='neither',
              dtype='interval[float64]')

lbl_indexes
[0 0 0 0 0 0 0 0] <--------- ALL ZEROS

   a <= x   x<b
0    True  True
1    True  True
2    True  True
3    True  True
4    True  True
5    True  True
6    True  True
7    True  True

IntervalWidth=0    4748.816
dtype: float64