Question

考虑此数据集：

cat > example_file.txt <<EOL
bins|1,1|2,2|3,4|5,6|7,9|10,12|13,18|19,24|25,36|37,54|55,81
1,1||431|501|651|861|1081|1461|1711|2071|2371|2531
2,2|||261|401|631|871|1291|1611|2001|2291|2431
3,4|||121|271|551|1011|1191|1511|1901|2211|2351
5,6||||101|361|901|1251|1301|1691|2011|2221
7,9|||||181|461|841|1151|1511|1821|2061
10,12||||||161|591|931|1291|1621|1821
13,18|||||||351|691|1091|1401|1571
19,24||||||||301|861|1451|1201
25,36|||||||||371|851|961
37,54||||||||||371|621
55,81|||||||||||351
EOL

我导入的

import pandas
example = pandas.read_csv('example_file.txt', sep = '|', index_col = 0)

现在，此表的行和列名称实际上是间隔范围。我可以让大熊猫这样认出他们：

col_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns])
row_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index])
example.columns = col_bin
example.index   = row_bin

现在，我想找到与值对相对应的单元格值如（11，13）。

例如，对于（11，13），单元格值为591。

这是因为行索引值（11）属于间隔（10,12）[这是第六行]，而col索引值（13）属于间隔（13,18）[这是第七列]，并且值（行，列）（6、7）在表格中为591。

Answer 1

您需要首先将参数closed='both'添加到IntervalIndex.from_tuples，因为在两边都关闭了整数：

col_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns], closed='both')
row_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index], closed='both')
example.columns = col_bin
example.index   = row_bin
print (example)
         [1, 1]  [2, 2]  [3, 4]  [5, 6]  [7, 9]  [10, 12]  [13, 18]  \
[1, 1]       NaN   431.0   501.0   651.0   861.0    1081.0    1461.0   
[2, 2]       NaN     NaN   261.0   401.0   631.0     871.0    1291.0   
[3, 4]       NaN     NaN   121.0   271.0   551.0    1011.0    1191.0   
[5, 6]       NaN     NaN     NaN   101.0   361.0     901.0    1251.0   
[7, 9]       NaN     NaN     NaN     NaN   181.0     461.0     841.0   
[10, 12]     NaN     NaN     NaN     NaN     NaN     161.0     591.0   
[13, 18]     NaN     NaN     NaN     NaN     NaN       NaN     351.0   
[19, 24]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[25, 36]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[37, 54]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[55, 81]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   

          [19, 24]  [25, 36]  [37, 54]  [55, 81]  
[1, 1]      1711.0    2071.0    2371.0      2531  
[2, 2]      1611.0    2001.0    2291.0      2431  
[3, 4]      1511.0    1901.0    2211.0      2351  
[5, 6]      1301.0    1691.0    2011.0      2221  
[7, 9]      1151.0    1511.0    1821.0      2061  
[10, 12]     931.0    1291.0    1621.0      1821  
[13, 18]     691.0    1091.0    1401.0      1571  
[19, 24]     301.0     861.0    1451.0      1201  
[25, 36]       NaN     371.0     851.0       961  
[37, 54]       NaN       NaN     371.0       621  
[55, 81]       NaN       NaN       NaN       351

然后通过IntervalIndex.get_loc获取职位并通过DataFrame.iat选择职位：

tup = (11, 13)
pos1 = example.index.get_loc(tup[0])
pos2 = example.columns.get_loc(tup[1])

print (example.iat[pos1, pos2])
591.0

使用pandas.IntervalIndex作为数据框索引

1 个答案: