Question

我有一个带有['text', 'label']列的pandas数据框对象，其标签值为＆＃39; pos＆＃39;或者＆＃39; neg＆＃39;。

问题在于，我有更多的专栏，其中包括＆＃39; neg＆＃39;标签，因为我有＆＃39; pos＆＃39;。

现在的问题是，是否有可能随机选择“否定”。句子为＆＃39; pos＆＃39;句子，所以我得到一个新的数据帧，两个标签的比例为50:50？

我是否必须算上这个＆＃39; pos＆＃39;句子将它们全部放在一个新的数据框中，然后执行neg_df = dataframe.sample(n=pos_count)并将其附加到之前创建的所有正数据框中，或者是否有更快的方法？

感谢您的帮助。

Answer 1

# Sample data.
df = pd.DataFrame({'text': ['a', 'b', 'c', 'd', 'e'], 
                   'label': ['pos'] * 2 + ['neg'] * 3})
>>> df
  label text
0   pos    a
1   pos    b
2   neg    c
3   neg    d
4   neg    e

# Create views of 'pos' and 'neg' text.
neg_text = df.loc[df.label == 'neg', 'text']
pos_text = df.loc[df.label == 'pos', 'text']

# Equally sample 'pos' and 'neg' with replacement and concatenate into a dataframe.
result = pd.concat([neg_text.sample(n=5, replace=True).reset_index(drop=True), 
                    pos_text.sample(n=5, replace=True).reset_index(drop=True)], axis=1)

result.columns = ['neg', 'pos']

>>> result
  neg pos
0   c   b
1   d   a
2   c   b
3   d   a
4   e   a

大熊猫随机抽样，比例为1：1的特定栏目

1 个答案: