尝试检查数据类型是否已正确读取,在Jupyter中工作正常,但在Pycharm中出现上述错误(2018.3.3 CE,内部版本号183.5153.39,2019年1月9日):
import pandas.api.types as ptypes
filename = "myfile.csv"
numericcols = ['YEAR', 'AMOUNT']
stringcols = ['ID', 'REFERENCE']
dfcols = ['YEAR', 'ID', 'REFERENCE', 'AMOUNT']
coltypesdict = {'YEAR': int, 'AMOUNT': float, 'ID': str, 'REFERENCE': str}
start_pd = time.time()
try:
with open (filename, 'rb') as file:
reader = pd.read_csv(filename, chunksize=10000, header = None, names=dfcols, dtype=coltypesdict)
df = pd.concat([x for x in reader], ignore_index=True)
assert(set(df.columns) == set(dfcols))
assert all(ptypes.is_numeric_dtype(df[col]) for col in numericcols) == True
assert all(ptypes.is_string_dtype(df[col]) for col in stringcols) == True
print("{} read successfully in {:.2f} secs".format(filename, time.time() - start_pd))
except IOError:
print("could not read {}".format(filename))
还尝试使用相同的结果分离出代码:
numericcolslist = [ptypes.is_numeric_dtype((df[col]) for col in numericcols)]
stringccolslist = [ptypes.is_string_dtype((df[col]) for col in stringcols)]
assert all(numericcolslist) == True
assert all(stringccolslist) == True
谢谢