有条件地进行迭代并输出数据框和图表

时间:2019-03-02 22:28:00

标签: python pandas for-loop seaborn

我正在尝试为每个“阀门”输出数据框和图表。我正在努力拼凑一些Pythonic基础。

流程:我获取数据框,进行分组,获取总数的百分比...输出表格和图表。但是,我想遍历此过程,第一次是在Reviewed?=='Yes'上使用数据帧过滤器,然后是No

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

def func(df):
    vals = ['Yes','No']
    for i in range(len(vals)):
        for x in vals:
            gb[i] = df[df['Reviewed?']==x].groupby(['Gender'])['Region'].count().reset_index()
            total[i] = gb[i]['Region'].sum()
            gb[i]['Percentage'] = (gb[i]['Region'] / total[i])
            gb[i] = gb[i].sort_values(by='Percentage', ascending=False)
            sns.barplot(data=gb[i], x='Region', y='Percentage')
    plt.show()
    return gb[i]

一些错误消息:

ValueError: could not broadcast input array from shape (0,2) into shape (0)

ValueError: cannot copy sequence with size 2 to array axis with dimension 0

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

更新 这是我想要的暴力版本。我只想要一种更有效,更动态的方式来做到这一点。

请注意,最初我并没有明确表示我想将计数保留在最终数据框中...

import pandas as pd
import seaborn as sns

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

def func(df):
    gb = df[df['Reviewed?']=='No'].groupby(['Gender'])['Region'].count().reset_index()
    total = gb['Region'].sum()
    gb['Percentage'] = (gb['Region'] / total)
    notyetreviewed = gb.sort_values(by='Percentage', ascending=False)
    sns.barplot(data=notyetreviewed, x='Gender', y='Percentage')
    bottom, top = plt.ylim(0,1) 
    plt.show()

    gb = df[df['Reviewed?']=='Yes'].groupby(['Gender'])['Region'].count().reset_index()
    total = gb['Region'].sum()
    gb['Percentage'] = (gb['Region'] / total)
    reviewed = gb.sort_values(by='Percentage', ascending=False)
    bottom, top = plt.ylim(0,1)  
    sns.barplot(data=reviewed, x='Gender', y='Percentage')
    plt.show()

    return notyetreviewed, reviewed
func(df)

2 个答案:

答案 0 :(得分:0)

您可以尝试以下操作:

import pandas as pd

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

for outcome in ['Yes', 'No']:
    filtered = df[df['Reviewed?'].eq(outcome)]['Gender'].value_counts(normalize=True)
    filtered.plot.bar()

在这种情况下,我通过Reviewed?结果过滤每个循环上的DF,然后获取男性和女性的比例值。您的问题构成了一个二元选择,但我想可以将其扩展为for outcome in df['Reviewed?'].unique():

答案 1 :(得分:0)

这是一个微不足道的改进。很高兴看到一个不需要Python的解决方案,不需要我将'Reviewed?'硬编码到函数调用中……

import pandas as pd
import seaborn as sns

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

def func(df,group,reviewed):
    df = df[df['Reviewed?'].isin(reviewed)].groupby([group])['Region'].count().reset_index()
    df['Percentage'] = df['Region'] / df['Region'].sum()
    sns.barplot(data=df, x='Gender', y='Percentage')
    bottom, top = plt.ylim(0,1)
    plt.show()
    return df

df1 = func(df,'Gender',['Yes'])
df1 = func(df,'Gender',['No'])