Question

我最近使用熊猫的“ read_json”方法遇到了一个奇怪的行为。我在目录mypath中有一个JSON文件列表，我想使用它读取熊猫，追加并获取最终数据集。我使用循环完成此操作，并按如下所示追加数据：

from os.path import join
import pandas as pd
from os import listdir

GroundTruthFiles=[file for file in listdir(mypath) if file.endswith(".json")]
dfGroundTruth=pd.read_json(join(mypath,GroundTruthFiles[0]),orient="index")

for file in GroundTruthFiles[1:]:
    nextfile=pd.read_json(join(mypath,file),orient="index")
    dfGroundTruth=dfGroundTruth.append(nextfile)

所有JSON文件都具有相同的数据格式，并包含具有float格式的目标变量（'driving_time'）。当我创建初始数据帧时，列“ driving_time”保持其float格式。但是，当我遍历其他文件时，数据类型将转换为datetime64[ns]格式。最终数据帧的变量“行车时间”为object格式

我尝试使用read_json内置选项orient

 for .... :
    pd.read_json(..., dtype = {'driving_time : float})

但这不能解决我的问题。

我还对文件进行了如下迭代：

 for .... :
    nextfile=pd.read_json(join(mypath,file)).T

适用于变量本身，但会修改所有其他数据类型（我想避免）。

这种奇怪的行为是否有任何解释？一个解决我问题的好方法？

Python / Pandas：read_json中的'orient'='index'参数更改列数据类型

0 个答案: