Question

我想使用iloc按行或列对数据帧进行切片，同时环绕超出范围的索引。这是一个示例：

import pandas as pd
df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]],columns=['a', 'b', 'c'])
#Slice the rows from 2 to 4, which the dataframe only have 3 rows
print(df.iloc[2:4,:])

数据框：

    a   b   c  
0   1   2   3  
1   4   5   6  
2   7   8   9

输出将是：

    a   b   c
2   7   8   9

但是我想环绕超出范围的索引，就像这样：

    a   b   c
2   7   8   9
0   1   2   3

在numpy中，可以使用numpy.take来环绕边界索引进行切片。（The numpy take link）

import numpy as np
array = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(array.take(range(2,4) , axis = 0, mode='wrap'))

输出为：

 [[7 8 9]
 [1 2 3]]

在pandas中结束的一种可能的解决方案是使用numpy.take：

import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]],columns=['a', 'b', 'c'])
# Get the integer indices of the dataframe
row_indices = np.arange(df.shape[0])
# Wrap the slice explicitly
wrap_slice = row_indices.take(range(2,4),axis = 0, mode='wrap')
print(df.iloc[wrap_slice, :])

输出将是我想要的输出：

   a  b  c
2  7  8  9
0  1  2  3

我查看了pandas.DataFrame.take，但没有"wrap"模式。（The pandas take link）。解决此问题的简便方法是什么？非常感谢你！

Answer 1

让我们尝试使用np.roll：

df.reindex(np.roll(df.index, shift=-2)[0:2])

输出：

   a  b  c
2  7  8  9
0  1  2  3

并且，使其更加通用：

startidx = 2
endidx = 4

df.iloc[np.roll(df.index, shift=-1*startidx)[0:endidx-startidx]]

Answer 2

您可以使用余数除法

import numpy as np

start_id = 2
end_id = 4
idx = np.arange(start_id, end_id, 1)%len(df)

df.iloc[idx]
#   a  b  c
#2  7  8  9
#0  1  2  3

此方法实际上允许您多次循环：

start_id = 2
end_id = 10
idx = np.arange(start_id, end_id, 1)%len(df)

df.iloc[idx]
#   a  b  c
#2  7  8  9
#0  1  2  3
#1  4  5  6
#2  7  8  9
#0  1  2  3
#1  4  5  6
#2  7  8  9
#0  1  2  3

环绕索引以在熊猫数据框中切片的好方法

2 个答案: