我想在groupBy对象上计算多个值(包含在每个单元格的列表中)。
我有以下数据框:
| | Record the respondent’s sex | 7. What do you use the phone for? |
|---|-----------------------------|---------------------------------------------|
| 0 | Male | sending texts;calls;receiving sending texts |
| 1 | Female | sending texts;calls;WhatsApp;Facebook |
| 2 | Male | sending texts;calls;receiving texts |
| 3 | Female | sending texts;calls |
在7. What do you use the phone for?
分组后,我想计算Record the respondent’s sex
列中的每个值。
当每个单元格只有一个值时,我没有问题。
grouped = df.groupby(['Record the respondent’s sex'], sort=True)
question_counts = grouped['2. Are you a teacher, caregiver, or young adult ?'].value_counts(normalize=False, sort=True)
question_data = [
{'2. Are you a teacher, caregiver, or young adult ?': question, 'Record the respondent’s sex': group, 'count': count*100} for
(group, question), count in dict(question_counts).items()]
df_question = pd.DataFrame(question_data)
给我一张看起来完全像这样的表格:
| 7. What do you use the phone for? | Record the respondent's sex | count |
|-----------------------------------|-----------------------------|-------|
| sending texts | Male | 2 |
| calls | Male | 2 |
| receiving texts | Male | 2 |
| sending texts | Female | 2 |
| calls | Female | 2 |
| WhatsApp | Female | 1 |
| Facebook | Female | 1 |
如果只有我可以使用多个值!
value_counts()
无法处理具有多个值的列表,会引发TypeError: unhashable type: 'list'
错误。问题Counting occurrence of values in a Panda series?显示了如何以各种方式处理这个问题,但我似乎无法让它在GroupBy对象上工作。
答案 0 :(得分:1)
# Initialize sample data.
df = pd.DataFrame({'Record the respondent’s sex': ['Male', 'Female'] * 2,
'7. What do you use the phone for?': [
"sending texts;calls;receiving sending texts",
"sending texts;calls;WhatsApp;Facebook",
"sending texts;calls;receiving texts",
"sending texts;calls"
]})
# Split the values on ';' and separate into columns. Melt the result.
df2 = pd.melt(
pd.concat([df['Record the respondent’s sex'],
df.loc[:, "7. What do you use the phone for?"].apply(
lambda series: series.split(';')).apply(pd.Series)], axis=1),
id_vars='Record the respondent’s sex')[['Record the respondent’s sex', 'value']]
# Group on gender and rename columns.
result = df2.groupby('Record the respondent’s sex')['value'].value_counts().reset_index()
result.columns = ['Record the respondent’s sex', '7. What do you use the phone for?', 'count']
# Reorder columns.
>>> result[['7. What do you use the phone for?', 'Record the respondent’s sex', 'count']]
7. What do you use the phone for? Record the respondent’s sex count
0 calls Female 2
1 sending texts Female 2
2 Facebook Female 1
3 WhatsApp Female 1
4 calls Male 2
5 sending texts Male 2
6 receiving sending texts Male 1
7 receiving texts Male 1