我正尝试通过熊猫读取具有以下数据的csv文件(摘录自https://data.worldbank.org/indicator/NY.GDP.MKTP.CD)
"Afghanistan","AFG","GDP (current US$)","NY.GDP.MKTP.CD","537777811.111111"
"Burundi","BDI","GDP (current US$)","NY.GDP.MKTP.CD","195999990"
使用我的命令
GDP = pd.read_csv('world_bank.csv')
在我的数据框中,"537777811.111111"
转换为NaN
"195999990"
已正确转换。
浮点转换似乎存在问题。我该如何预防?
答案 0 :(得分:0)
您可以手动告诉Pandas对特定列使用特定数据类型。
import np
...your code...
#Let's say you name your columns:
COL_NAMES=['Country','CountryCode','GDP_Type','WhateverField','GDP']
# you can specify datatype for a single column and let Pandas guess the rest:
COL_TYPES={'GDP':np.float64 }
GDP=pd.read_csv('world_bank.csv',names=COL_NAMES,dtype=COL_TYPES)
类似的构造可以帮我解决问题。
另请参阅: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html