在下面的示例中,我无法理解为什么此间隔查找不能满足我所有数据点的0索引匹配的期望:
import pandas
dfLbl = pandas.DataFrame( { 'Started':[554235706.051] , 'Stopped':[554240454.867] , 'Label':['LblVal'] } )
dfData = pandas.DataFrame( {'Angle': [-89.460618, -90.053987, -89.735639, -179.248331, 90.405555, 0.541808, 1.257457, 0.16111] ,
'time_s':[554237043.713062, 554238249.989954, 554235853.912149, 554237638.876251, 554237007.218903, 554239665.777394, 554238786.764156, 554239549.519223] })
print( "dfData\n{}".format( dfData ))
print( "\ndfLbl\n{}".format( dfLbl ))
lbl_intervals = pandas.IntervalIndex.from_arrays( dfLbl['Started'] , dfLbl['Stopped'] , closed='neither' )
lbl_indexes = lbl_intervals.get_indexer( dfData['time_s'] )
print( "\nlbl_intervals\n{}".format( lbl_intervals ))
print( "\nlbl_indexes\n{}".format( lbl_indexes ))
print( "\n{}".format( pandas.DataFrame( { 'a <= x': dfLbl.loc[0,'Started'] <= dfData['time_s'] , 'x < b' : dfData['time_s'] < dfLbl.loc[0,'Stopped']} ) ))
print( "\nIntervalWidth={}".format( dfLbl['Stopped'] - dfLbl['Started'] ))
哪个提供以下控制台输出
dfData
Angle time_s
0 -89.460618 5.542370e+08
1 -90.053987 5.542382e+08
2 -89.735639 5.542359e+08
3 -179.248331 5.542376e+08
4 90.405555 5.542370e+08
5 0.541808 5.542397e+08
6 1.257457 5.542388e+08
7 0.161110 5.542395e+08
dfLbl
Label Started Stopped
0 LblVal 5.542357e+08 5.542405e+08
lbl_intervals
IntervalIndex([(554235706.051, 554240454.867)]
closed='neither',
dtype='interval[float64]')
lbl_indexes
[-1 -1 -1 -1 -1 -1 -1 -1]
a <= x x < b
0 True True
1 True True
2 True True
3 True True
4 True True
5 True True
6 True True
7 True True
IntervalWidth=0 4748.816
dtype: float64
我完全困惑,因为当我手动进行a <= x
和x < b
列中的不等式时,它表明time_s
数据是有界的。我检查了索引器的宽度是否不是一些不合理的小数字。还有什么地方出问题了?使用单个间隔进行此类操作是否有些谬误?熊猫会将值强制转换为其他类型的东西吗?
答案 0 :(得分:0)
好像是我使用的熊猫版本中的错误。我查看了GitHub What's New page,发现了一些涉及索引的错误修复。所以我升级了...
$ pip install --upgrade pandas
...
Installing collected packages: pandas
Found existing installation: pandas 0.22.0
Uninstalling pandas-0.22.0:
Successfully uninstalled pandas-0.22.0
Successfully installed pandas-0.23.4
然后重新运行脚本,发现控制台输出现在可以提供预期的结果...
dfData
Angle time_s
0 -89.460618 5.542370e+08
1 -90.053987 5.542382e+08
2 -89.735639 5.542359e+08
3 -179.248331 5.542376e+08
4 90.405555 5.542370e+08
5 0.541808 5.542397e+08
6 1.257457 5.542388e+08
7 0.161110 5.542395e+08
dfLbl
Label Started Stopped
0 LblVal 5.542357e+08 5.542405e+08
lbl_intervals
IntervalIndex([(554235706.051, 554240454.867)]
closed='neither',
dtype='interval[float64]')
lbl_indexes
[0 0 0 0 0 0 0 0] <--------- ALL ZEROS
a <= x x<b
0 True True
1 True True
2 True True
3 True True
4 True True
5 True True
6 True True
7 True True
IntervalWidth=0 4748.816
dtype: float64