我想创建一个包含300个特征和实例的数据集,这些特征和实例是0或1的组合(布尔值)。我必须使用一些id来指定1。我如何使用python进行操作。 例如:一个实例应该像列4,45,213,6,48应该是1和那些id的组合
答案 0 :(得分:0)
希望现在还不算太晚,我理解你的问题
您要求的主要项目有两个:
1.生成尺寸为300 * n的二维300特征布尔样本集
2.生成一个因变量,列出每个观察(行)成功的特征
这是我的方法:
#%% Imports
# Data manipulation
import numpy as np
import pandas as pd
import pprint # Print a nice output
PP = pprint.PrettyPrinter(indent=4)
#%% List columns
def list_true_columns(x):
result = []
for i in range(0,len(x)):
if x[i] == 1:
result += [i]
return result
column_amount = 300
row_amount = 1000
#%% Sample dataset
dataset = pd.DataFrame(np.random.binomial(n=1, p=0.5, size = (row_amount, column_amount)))
# Based on the sample, calculate dependent variable
dataset['dependent'] = dataset.apply(list_true_columns, axis = 1)
PP.pprint(dataset.head)
以下是样本的负责人:
0 1 2 3 4 5 6 7 8 9 ... 291 292 293 294 295 296 297 298 299
0 0 1 1 0 1 1 1 0 1 0 ... 1 1 0 0 0 0 0 1 1
1 1 1 0 0 0 1 0 1 1 0 ... 0 1 1 1 0 1 1 0 1
2 0 1 0 0 1 1 0 1 0 0 ... 0 1 0 1 0 0 1 1 0
3 0 1 0 1 0 0 1 1 1 0 ... 0 0 0 0 0 1 1 0 0
4 1 0 1 1 0 0 0 0 1 0 ... 1 1 1 0 0 0 1 0 1
5 0 0 1 1 1 1 0 1 0 0 ... 1 1 0 1 0 1 1 1 0
.. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ...
994 1 1 0 1 1 0 1 1 0 1 ... 0 0 0 1 0 0 1 0 0
995 1 0 1 0 0 0 0 1 0 0 ... 1 1 0 0 0 0 1 0 1
996 1 0 1 0 1 0 0 0 0 1 ... 1 1 0 0 0 1 1 0 1
997 0 0 0 1 0 1 1 0 0 0 ... 1 0 1 1 0 0 0 1 0
998 0 0 0 0 0 1 1 1 1 0 ... 1 0 0 0 1 1 1 1 0
999 0 0 1 0 0 0 1 1 1 1 ... 1 0 0 1 1 1 1 1 1
这是因变量的头部:
dependent
0 [1, 2, 4, 5, 6, 8, 11, 15, 17, 18, 19, 20, 21,...
1 [0, 1, 5, 7, 8, 12, 15, 16, 17, 18, 19, 20, 24...
2 [1, 4, 5, 7, 11, 12, 15, 16, 18, 26, 27, 28, 2...
3 [1, 3, 6, 7, 8, 11, 12, 15, 16, 23, 25, 27, 28...
4 [0, 2, 3, 8, 13, 16, 18, 19, 20, 21, 22, 28, 2...
5 [2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 15, 21, 24...
.. ...
994 [0, 1, 3, 4, 6, 7, 9, 10, 11, 15, 17, 20, 21, ...
995 [0, 2, 7, 12, 13, 14, 15, 16, 17, 19, 22, 23, ...
996 [0, 2, 4, 9, 11, 13, 16, 17, 18, 20, 21, 23, 2...
997 [3, 5, 6, 11, 14, 20, 21, 22, 24, 28, 30, 35, ...
998 [5, 6, 7, 8, 13, 17, 19, 20, 22, 23, 24, 28, 3...
999 [2, 6, 7, 8, 9, 14, 17, 18, 19, 20, 21, 22, 23...