Question

我有各种dataframes命名：step1，step2，step5，step7等等。

我编写了如下函数：

def statistics(df):
    plus_one = df['BacksGas_Flow_sccm'][df['y_ocsvm'] == 1].describe()
    negative_one = df['BacksGas_Flow_sccm'][df['y_ocsvm'] == -1].describe()
    return plus_one, negative_one

其中BacksGas_Flow_sccm和y_ocsvm是所有不同数据框中的列名。

在此之后，我试图创建一个新的数据框，其中包含describe()返回的统计记录，而我是通过以下方式完成的：

stats = pd.DataFrame(statistics(step1))
stats = stats.append(pd.DataFrame(statistics(step2)))

那给了我

                    count          mean               std   min 25% 50% 75% max
BacksGas_Flow_sccm  1622.0  0.4370119194410199  0.11346778078574718 0.33333333333333304 0.33333333333333304 0.5 0.5 0.6666666666666665
BacksGas_Flow_sccm  426.0   0.19444444444444436 0.1873737774126198  0.0 0.16666666666666652 0.16666666666666652 0.16666666666666652 1.0
BacksGas_Flow_sccm  1285.0  0.5418071768266265  0.1998356616378414  0.2222222222222221  0.2222222222222221  0.6666666666666667  0.6666666666666667  0.6666666666666667
BacksGas_Flow_sccm  8028.0  0.4678901622100473  0.10157692912484724 0.0 0.4444444444444444  0.4444444444444444  0.5555555555555556  0.9999999999999998

我只希望将索引名从BacksGas_Flow_sccm更改为它们所属的相应数据框

预期输出：

         count         mean               std   min 25% 50% 75% max
Step1   1622.0  0.4370119194410199  0.11346778078574718 0.33333333333333304 0.33333333333333304 0.5 0.5 0.6666666666666665
Step1   426.0   0.19444444444444436 0.1873737774126198  0.0 0.16666666666666652 0.16666666666666652 0.16666666666666652 1.0
Step2   1285.0  0.5418071768266265  0.1998356616378414  0.2222222222222221  0.2222222222222221  0.6666666666666667  0.6666666666666667  0.6666666666666667
Step2   8028.0  0.4678901622100473  0.10157692912484724 0.0 0.4444444444444444  0.4444444444444444  0.5555555555555556  0.9999999999999998

我想知道该怎么办。

谢谢

Answer 1

您可以在statistics函数中这样做，将名称传递给：

def statistics(df, name):
    plus_one = df['BacksGas_Flow_sccm'][df['y_ocsvm'] == 1].describe()
    negative_one = df['BacksGas_Flow_sccm'][df['y_ocsvm'] == -1].describe()
    ret_df = pd.DataFrame((plus_one, negative_one))
    ret_df['source'] = name

    return ret_df

stats = pd.DataFrame(statistics(step1, 'step1'))
stats = stats.append(pd.DataFrame(statistics(step2, 'step2')))

Answer 2

这很丑陋，但这应该可以为您提供想要的内容而无需重复索引：

stats = pd.DataFrame(statistics(step1))
stats['step'] = 'Step1'
temp = pd.DataFrame(statistics(step2))
temp['step'] = 'Step2'
stats = stats.append(temp)
stats = stats.reset_index()

在Python中应用函数后，如何更改数据框的索引？

2 个答案: