我尝试了什么

Question

我正在使用pandas来处理更大的文件，我使用了get_chunk方法，但是没有正确加载。

我尝试了什么

//div[@class='bcdef']/span/@hahahaha

给出：

def load_data():
    reader = pd.read_table('/Users/fiz/Desktop/xad', iterator=True,encoding='utf-8')
    loop = True
    chunkSize = 10000
    chunks = []
    while loop:
       try:
          chunk = reader.get_chunk(chunkSize)
          chunks.append(chunk)
          print(chunk)

       except StopIteration:
          loop = False
          print("Iteration is stopped.")
    df = pd.concat(chunks, ignore_index=True)

所需的输出

enter image description here

Answer 1

您的数据文件似乎是JSON。

尝试pandas.read_json method

看起来数据也处于'records'方向，例如：

pd.read_json('/Users/fiz/Desktop/xad', orient='records', encoding='utf-8')

可能是一个好的开始。

可悲的是，read_json方法似乎不支持分块。

pandas read_table为单列提供JSON数据

我尝试了什么

给出：

所需的输出

1 个答案: