我正在使用调查中的数据处理两个DataFrame对象,但我无法正确合并它们。结构看起来像这样:
In [93]: numeric_answers
Out[93]:
ANSWER_COUNT RESPONSE
1 50 1
2 21 2
4 3 4
In [94]: readable_values
Out[94]:
MEANING
RESPONSE
1 male
2 female
3 transgender
5 non-binary, genderqueer, or gender non-conforming
6 a different identity (please specify)
4 prefer not to disclose
-9 Not answered
我的目标是:
RESPONSE
列['RESPONSE', 'MEANING', 'ANSWER_COUNT']
列N/A
(尽管0也可以)所需输出的示例:
RESPONSE MEANING ANSWER_COUNT
1 male 50
2 female 21
3 transgender NaN
5 non-binary, genderqueer, or gender non-conforming NaN
6 a different identity (please specify) NaN
4 prefer not to disclose 3
-9 Not answered NaN
阅读merge
的文档后,我得知我需要的是pd.merge(readable_values, numeric_answers)
,但此操作会产生一个空结果:
Empty DataFrame
Columns: [RESPONSE, MEANING, ANSWER_COUNT]
Index: []
经过各种论证的尝试后,merge(readable_values, numeric_answers, on='RESPONSE', how='outer')
得到了一个有希望的结果:
(Pdb) pd.merge(readable_values, numeric_answers, on='RESPONSE', how='outer')
RESPONSE MEANING ANSWER_COUNT
0 1.0 male NaN
1 2.0 female NaN
2 3.0 transgender NaN
3 5.0 non-binary, genderqueer, or gender non-conforming NaN
4 6.0 a different identity (please specify) NaN
5 4.0 prefer not to disclose NaN
6 -9.0 Not answered NaN
7 1.0 NaN 50.0
8 2.0 NaN 21.0
9 4.0 NaN 3.0
但是,它通过附加值进行合并,而我需要使用RESPONSE
列交叉条目。 Pandas实现这一目标的思想方法是什么?
答案 0 :(得分:3)
readable_values
将RESPONSE作为索引,而不是列
您可以将合并视为:
In [11]: numeric_answers.merge(readable_values, left_on='RESPONSE', right_index=True, how='outer')
Out[11]:
ANSWER_COUNT RESPONSE MEANING
1 50.0 1 male
2 21.0 2 female
4 3.0 4 prefer not to disclose
4 NaN 3 transgender
4 NaN 5 non-binary, genderqueer, or gender non-conforming
4 NaN 6 a different identity (please specify)
4 NaN -9 Not answered
另一种选择是先reset_index
readable_values
:
In [12]: numeric_answers.merge(readable_values.reset_index(), on='RESPONSE', how='outer')
Out[12]:
ANSWER_COUNT RESPONSE MEANING
0 50.0 1 male
1 21.0 2 female
2 3.0 4 prefer not to disclose
3 NaN 3 transgender
4 NaN 5 non-binary, genderqueer, or gender non-conforming
5 NaN 6 a different identity (please specify)
6 NaN -9 Not answered
请注意您在渲染方式上可以看到的区别:
In [21]: readable_values
Out[21]:
MEANING
RESPONSE
1 male
2 female
3 transgender
5 non-binary, genderqueer, or gender non-conforming
6 a different identity (please specify)
4 prefer not to disclose
-9 Not answered
In [22]: readable_values.reset_index() # RESPONSE is now a column
Out[22]:
RESPONSE MEANING
0 1 male
1 2 female
2 3 transgender
3 5 non-binary, genderqueer, or gender non-conforming
4 6 a different identity (please specify)
5 4 prefer not to disclose
6 -9 Not answered