为什么将数据帧写入文件并再次读回会改变数组分配的行为?

时间:2019-06-01 09:00:09

标签: python pandas dataframe

我的任务涉及附加两个相同类型的数据框(表示不同的时间段),并应用lambda函数来修改附加数据框上的列。

这在正常运行时可以按预期工作,但是如果将附加数据帧写入csv并再次读回则失败。

设置

import pandas as pd
import os

os.chdir('/path//to/directory')

df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')

def foo(item):
    return item.replace("0"*11,"")

分别在每个数据帧上应用lambda函数-有效

df['material'] = df.apply(lambda x: foo(x['material']), axis=1) #Works
df2['material'] = df2.apply(lambda x: foo(x['material']), axis=1) #Works 

在附加的数据帧上应用lambda函数-有效

df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df) 
df3['material'] = df3.apply(lambda x: foo(x['material']), axis=1) #Works

如果已保存并回读,则在数据帧df3上应用lambda函数-失败

如果创建了新列,它将起作用。

df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df)
df3.to_csv('Data-Appended.csv') #Writing to csv
df4 = pd.read_csv('Data-Appended.csv') #Reading it into a dataframe


df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work

df4['new'] = df4.apply(lambda x: foo(x['material']), axis=1) #Works

df3 ['material']和df4 ['material']的dtype为o类型。

无效行的跟踪输出:

Traceback (most recent call last):

  File "<ipython-input-42-096f7e61633e>", line 1, in <module>
    df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",
     

第2331行,位于设置中           self._set_item(键,值)

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",
     

第2398行,在_set_item中           NDFrame._set_item(自身,键,值)

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py",
     

第1759行,在_set_item中           self._data.set(键,值)

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",
     

第3731行,已设置           group = True):

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",
     

第4684行,在_get_blkno_placements中           对于blkno,lib.get_blkno_indexers(blknos,group)中的索引器:

  File "pandas/_libs/lib.pyx", line 1488, in pandas._libs.lib.get_blkno_indexers

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

0 个答案:

没有答案