我正在尝试为每个“阀门”输出数据框和图表。我正在努力拼凑一些Pythonic基础。
流程:我获取数据框,进行分组,获取总数的百分比...输出表格和图表。但是,我想遍历此过程,第一次是在Reviewed?=='Yes'
上使用数据帧过滤器,然后是No
。
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
def func(df):
vals = ['Yes','No']
for i in range(len(vals)):
for x in vals:
gb[i] = df[df['Reviewed?']==x].groupby(['Gender'])['Region'].count().reset_index()
total[i] = gb[i]['Region'].sum()
gb[i]['Percentage'] = (gb[i]['Region'] / total[i])
gb[i] = gb[i].sort_values(by='Percentage', ascending=False)
sns.barplot(data=gb[i], x='Region', y='Percentage')
plt.show()
return gb[i]
一些错误消息:
ValueError: could not broadcast input array from shape (0,2) into shape (0)
ValueError: cannot copy sequence with size 2 to array axis with dimension 0
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
更新 这是我想要的暴力版本。我只想要一种更有效,更动态的方式来做到这一点。
请注意,最初我并没有明确表示我想将计数保留在最终数据框中...
import pandas as pd
import seaborn as sns
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
def func(df):
gb = df[df['Reviewed?']=='No'].groupby(['Gender'])['Region'].count().reset_index()
total = gb['Region'].sum()
gb['Percentage'] = (gb['Region'] / total)
notyetreviewed = gb.sort_values(by='Percentage', ascending=False)
sns.barplot(data=notyetreviewed, x='Gender', y='Percentage')
bottom, top = plt.ylim(0,1)
plt.show()
gb = df[df['Reviewed?']=='Yes'].groupby(['Gender'])['Region'].count().reset_index()
total = gb['Region'].sum()
gb['Percentage'] = (gb['Region'] / total)
reviewed = gb.sort_values(by='Percentage', ascending=False)
bottom, top = plt.ylim(0,1)
sns.barplot(data=reviewed, x='Gender', y='Percentage')
plt.show()
return notyetreviewed, reviewed
func(df)
答案 0 :(得分:0)
您可以尝试以下操作:
import pandas as pd
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
for outcome in ['Yes', 'No']:
filtered = df[df['Reviewed?'].eq(outcome)]['Gender'].value_counts(normalize=True)
filtered.plot.bar()
在这种情况下,我通过Reviewed?
结果过滤每个循环上的DF,然后获取男性和女性的比例值。您的问题构成了一个二元选择,但我想可以将其扩展为for outcome in df['Reviewed?'].unique():
答案 1 :(得分:0)
这是一个微不足道的改进。很高兴看到一个不需要Python的解决方案,不需要我将'Reviewed?'
硬编码到函数调用中……
import pandas as pd
import seaborn as sns
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
def func(df,group,reviewed):
df = df[df['Reviewed?'].isin(reviewed)].groupby([group])['Region'].count().reset_index()
df['Percentage'] = df['Region'] / df['Region'].sum()
sns.barplot(data=df, x='Gender', y='Percentage')
bottom, top = plt.ylim(0,1)
plt.show()
return df
df1 = func(df,'Gender',['Yes'])
df1 = func(df,'Gender',['No'])