Question

为什么Pandas在从字典转换为数据帧时强制将ascii字符串转换为unicode？这是一个功能还是一个已知的bug？

我正在使用Python 2.7.3和Pandas 0.20.2

MWE包括在下面。

import pandas as pd

sample_dict={}
sample_dict['A'] = {'Key_1': 'A1', 'Key-2': 'A2', 'Key_3': 'A3'}
sample_dict['B'] = {'Key_1': 'B1', 'Key-2': 'B2', 'Key_3': 'B3'}
sample_dict['C'] = {'Key_1': 'C1', 'Key-2': 'C2', 'Key_3': 'C3'}
print sample_dict['A'].keys()
sample_df = pd.DataFrame.from_dict(sample_dict, orient='index')
print sample_df.keys()

结果：

['Key-2', 'Key_1', 'Key_3']
Index([u'Key-2', u'Key_1', u'Key_3'], dtype='object')

附录：我遇到了this类似的问题，但它已经闲置了几年，并没有讨论为什么会发生这种情况。

Answer 1

来自pandas dataframe repr 它说

    """
    Return a string representation for a particular object.

    Yields Bytestring in Py2, Unicode String in py3.
    """

所以我相信在python 3中你不应该看到任何unicode前缀。

为什么Pandas强制使用unicode列名来代替字符串？

1 个答案: