Question

考虑两个数据帧df1和df2，它们分别具有N列和M行。

我想在两个数据框中随机采样相同的位置。

要在df1中采样一个位置，我使用

df1.sample(1,axis=1).sample(1,axis=0)

我想在另一个数据框中采样相同的位置。这种采样将发生k次（可以想像为生成k个元组，其中每个元组是来自特定列和特定行的数据），并且每次都需要一个新的唯一位置。

我尝试了以下操作：

for i in xrange(k):

    a = df1.sample(1, axis=1).sample(1, axis=0)

    b = df2[a.index]

我收到以下错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2679, in __getitem__
    return self._getitem_array(key)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2723, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1327, in _convert_to_indexer
    .format(mask=objarr[mask]))
KeyError: "Int64Index([5], dtype='int64') not in index"

我应该借助Numpy生成唯一的位置值，然后索引到那些位置吗？还是有办法在熊猫中实现这一目标？

Answer 1

您可以使用位置选择为numpy.random.choice的{{1}}

.iloc

Answer 2

一种粗略的方法：

first_sample = df1.sample(1, axis=1).sample(1, axis=0)

second_sample = df2.iloc[first_sample.index.tolist()][first_sample.columns.tolist()]

在两个数据框中随机采样同一位置

2 个答案: