Question

不确定如何正确地说出这一点，但这里有：

在Python中创建包含1和0的单列数据帧的最简单方法是什么？长度由某些输入决定？

例如，假设我的样本量为1000，其中100个成功（1）。那么零的量将是样本大小（即，1000）减去成功。所以输出将是一个长度为1000的df，其中100行包含一个，900表示零。

Answer 1

根据您的描述，一个简单的@Override protected void onDestroy() { super.onDestroy(); if(pDialog!=null && pDialog.isShowing()) pDialog.dismiss(); }可以解决问题。否则，您可以使用list或numpy.array / pandas.DataFrame（更像桌子）。

pandas.Series

所有这些都会创建一个零向量，然后根据需要分配成功（一个）。如果这些是遵循一些已知的分布，numpy也有生成跟随它们的随机向量的方法（see here）。

如果您真的在寻找熊猫方法，它也可以与之前的方法结合使用。也就是说，您可以为import numpy as np import pandas as pd input_length = 1000 # List approach: my_list = [0 for i in range(input_length)] # Numpy array: my_array = np.zeros(input length) # With Pandas: my_table = pd.Series(0, index=range(input_length)) / list的值分配numpy.array或Series。例如，假设您想要绘制1000个二项分布的随机样本，其中p = 0.5：

DataFrame

Answer 2

除了N.P.的回答。你可以这样做：

import pandas as pd
import numpy as np

def generate_df(df_len):

    values = np.random.binomial(n=1, p=0.1, size=df_len)
    return pd.DataFrame({'value': values})

df = generate_df(1000)

编辑：

更完整的功能：

def generate_df(df_len, option, p_success=0.1):
    '''
    Generate a pandas DataFrame with one single field filled with
    1s and 0s in p_success proportion and length df_len.
    Input:
        - df_len: int, length of the 1st dimension of the DataFrame
        - option: string,  determines how will the sample be generated
            * random: according to a bernoully distribution with p=p_success
            * fixed: failures first, and fixed proportion of successes p_success
            * fixed_shuffled: fixed proportion of successes p_success, random order
        - p_success: proportion of successes among total
    Output:
        - df: pandas Dataframe
    '''

    if option == 'random':
        values = np.random.binomial(n=1, p=p_success, size=df_len)

    elif option in ('fixed_shuffled', 'fixed'):

        n_success = int(df_len*p_success)
        n_fail = df_len - n_success

        values = [0]*n_fail + [1]*n_success

        if option == 'fixed_shuffled':
            np.random.shuffle(values)

    else:
        raise Exception('Unknown option: {}'.format(option))

    df = pd.DataFrame({'value': values})

    return df

在Python中创建数据框

2 个答案: