Question

我正在尝试从两个熊猫帧中获取随机样本。如果在帧A中选择了行（随机）2,5,8，则必须从帧B中选择相同的2,5,8行。我首先生成一个随机样本，现在想将此样本用作索引用于框架的行。我该怎么做？代码应该像

idx = list(random.sample(range(X_train.shape[0]),5000))

lgstc_reg[i].fit(X_train[idx,:], y_train[idx,:])

但是，运行代码会出现错误。

Answer 1

使用iloc：

indexes = [2,5,8]  # in your case this is the randomly generated list
A.iloc[indexes]
B.iloc[indexes]

另一种一致采样方法是设置随机种子，然后采样：

random_seed = 42
A.sample(3, random_state=random_seed)
B.sample(3, random_state=random_seed)

采样的数据帧将具有相同的索引。

Answer 2

希望这会有所帮助！

>>> df1
   value  ID
0      3   2
1      4   2
2      7   8
3      8   8
4     11   8
>>> df2
   value  distance
0      3         0
1      4         0
2      7         1
3      8         0
4     11         0

我有两个数据帧。我想选择df1的随机数以及df2的相应行。

首先，我使用熊猫内置函数sample_index创建一个df，其中包含sample的随机行列表。现在，在另一个内置函数df1的帮助下，使用此索引在df2和loc中定位这些行。

>>> selection_index = df1.sample(2).index
>>> selection_index
Int64Index([3, 1], dtype='int64')
>>> df1.loc[selection_index]
   value  ID
3      8   8
1      4   2
>>> df2.loc[selection_index]
   value  distance
3      8         0
1      4         0
>>>

就您而言，这有点像

idx = X_train.sample(5000).index

lgstc_reg[i].fit(X_train.loc[idx], y_train.loc[idx])

如何将随机数作为熊猫数据框的索引？

2 个答案: