我的任务涉及附加两个相同类型的数据框(表示不同的时间段),并应用lambda函数来修改附加数据框上的列。
这在正常运行时可以按预期工作,但是如果将附加数据帧写入csv并再次读回则失败。
设置
import pandas as pd
import os
os.chdir('/path//to/directory')
df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
def foo(item):
return item.replace("0"*11,"")
分别在每个数据帧上应用lambda函数-有效
df['material'] = df.apply(lambda x: foo(x['material']), axis=1) #Works
df2['material'] = df2.apply(lambda x: foo(x['material']), axis=1) #Works
在附加的数据帧上应用lambda函数-有效
df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df)
df3['material'] = df3.apply(lambda x: foo(x['material']), axis=1) #Works
如果已保存并回读,则在数据帧df3上应用lambda函数-失败
如果创建了新列,它将起作用。
df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df)
df3.to_csv('Data-Appended.csv') #Writing to csv
df4 = pd.read_csv('Data-Appended.csv') #Reading it into a dataframe
df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work
df4['new'] = df4.apply(lambda x: foo(x['material']), axis=1) #Works
df3 ['material']和df4 ['material']的dtype为o类型。
无效行的跟踪输出:
Traceback (most recent call last): File "<ipython-input-42-096f7e61633e>", line 1, in <module> df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",
第2331行,位于设置中 self._set_item(键,值)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",
第2398行,在_set_item中 NDFrame._set_item(自身,键,值)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py",
第1759行,在_set_item中 self._data.set(键,值)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",
第3731行,已设置 group = True):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",
第4684行,在_get_blkno_placements中 对于blkno,lib.get_blkno_indexers(blknos,group)中的索引器:
File "pandas/_libs/lib.pyx", line 1488, in pandas._libs.lib.get_blkno_indexers ValueError: Buffer has wrong number of dimensions (expected 1, got 0)