如何在熊猫中阅读希腊文字?

时间:2019-06-08 16:37:51

标签: python-3.x pandas encoding utf

我正在处理带有希腊字符的数据框。它们看起来像这样:

enter image description here

数据在这里:

toy.to_json()

'{"a_a":{"0":49.0,"1":50.0,"2":52.0,"3":53.0,"4":54.0},"grade":{"0":3.0,"1":5.0,"2":4.0,"3":5.0,"4":4.0},"sex":{"0":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","1":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","2":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","3":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9","4":"\\u00c1\\u00e3\\u00fc\\u00f1\\u00e9"},"age":{"0":122.0,"1":125.0,"2":119.0,"3":122.0,"4":127.0},"fath_job":{"0":2.0,"1":2.0,"2":2.0,"3":2.0,"4":2.0},"phscs":{"0":49.0,"1":73.0,"2":61.0,"3":75.0,"4":59.0},"pcc":{"0":10.0,"1":26.0,"2":19.0,"3":28.0,"4":23.0},"pcg":{"0":21.0,"1":28.0,"2":20.0,"3":25.0,"4":19.0},"tasc":{"0":17.0,"1":5.0,"2":17.0,"3":8.0,"4":11.0},"class":{"0":0.0,"1":0.0,"2":0.0,"3":0.0,"4":0.0},"grade3":{"0":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00ef\\u00f2","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2","2":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2","4":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00fc\\u00f2"},"pcc3":{"0":"\\u00f7\\u00e1\\u00ec\\u00e7\\u00eb\\u00de","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","2":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","4":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1"},"tasc3":{"0":3.0,"1":1.0,"2":3.0,"3":2.0,"4":2.0},"pcg3":{"0":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","2":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","4":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1"},"phscs3":{"0":"\\u00f7\\u00e1\\u00ec\\u00e7\\u00eb\\u00de","1":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","2":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1","3":"\\u00f5\\u00f8\\u00e7\\u00eb\\u00de","4":"\\u00ec\\u00dd\\u00f4\\u00f1\\u00e9\\u00e1"}}'

我尝试导入的编码为'utf_8'的文件,但无法正常工作。

这是我尝试过的其他方法:

toy.to_csv('toy.csv', index = False)

import chardet

rawdata = open('toy.csv', 'rb').read()

result = chardet.detect(rawdata)

charenc = result['encoding']

pd.read_csv('toy.csv', encoding = charenc)

enter image description here

pd.read_csv('toy.csv', encoding = 'cp737')

enter image description here

1 个答案:

答案 0 :(得分:0)

在Mac(macOS Catalina 10.15.7)上,此方法有效:

gdf = gpd.read_file("./../data/processed/landmarks.shp", encoding="mac_greek")

显然,您必须为您的系统找到正确的编码。可以在documentation中找到python编码列表。但是,它仍然不完美。我还有一些随机字符。我不知道这是由于数据源还是由于列表引用了python 2。

Greek encoded geo pandas data