使用tolist()函数填充固定数量(x)的新列,该函数有时包含较少的项目(<x)

时间:2019-07-17 11:50:12

标签: python pandas multiple-columns tolist

我正在使用tolist()在同一数据框中将1列中的8个项目列表(“ modelGreeks”)拆分为8个新列:

pd.DataFrame(df['modelGreeks'].tolist(), index=df.index)
df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = pd.DataFrame(df['modelGreeks'].tolist(), index=df.index)

我通常在“ modelGreeks”列中获得的列表:

(0.2953686167703842, -1.9317880628477724e-14, 1.4648640549124297e-15, 0.0, 6.240571011994176e-13, 1.1840837166645831e-15, -1.4648640549124297e-15, 10.444000244140625)

十分之九的效果非常完美。但是有时我通过API检索的数据不是完美/完整的。而不是在'modelGreeks'列中包含8个项目的预期列表,它在该字段中提供了'None'值,并且我在第二行代码的代码执行中收到以下错误消息(从逻辑上讲,因为它试图用仅有1个可用值:

ValueError: Columns must be same length as key

我正在寻找一种解决方案,无论如何都要创建和填充8个新列。 0或NaN或无。

希望有人可以帮忙。预先感谢您的努力。

以下代码有效:

df1 = pd.DataFrame(columns=['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice','modelGreeks'])
df1['modelGreeks'] = [[None, None, None, None, None, None, None, None], None, None, None, None]
df1[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df1['modelGreeks'].apply(pd.Series)

它返回:

   IV_model  59  Price_model  61  62  63  64  undPrice  modelGreeks
0  NaN       NaN NaN          NaN NaN NaN NaN NaN       [None, None, None, None, None, None, None, None]
1  NaN       NaN NaN          NaN NaN NaN NaN NaN       None
2  NaN       NaN NaN          NaN NaN NaN NaN NaN       None
3  NaN       NaN NaN          NaN NaN NaN NaN NaN       None
4  NaN       NaN NaN          NaN NaN NaN NaN NaN       None

那很好。唯一的问题是,在某些时候,我通过API从Interactive Brokers接收的数据集将在modelGreeks列的所有行中仅提供标量None值。如果将其应用于测试用例,则会再次收到错误消息(“ ValueError:列的长度必须与键的长度相同”):

df1 = pd.DataFrame(columns=['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice','modelGreeks'])
df1['modelGreeks'] = [None, None, None, None, None]
df1[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df1['modelGreeks'].apply(pd.Series)

Traceback (most recent call last):
File "/Users/floris/PycharmProjects/ib_insync/test1.py", line 9, in <module>
df1[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df1['modelGreeks'].apply(pd.Series)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 3367, in __setitem__
self._setitem_array(key, value)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 3389, in _setitem_array
raise ValueError('Columns must be same length as key')
ValueError: Columns must be same length as key

在这种情况下,我还希望仅在8列中看到NaN值。

3 个答案:

答案 0 :(得分:2)

由于您要转换列表的值,因此建议您先检查列表的长度。如果小于8,则可以在其后附加0。像这样:

lst = df['modelGreeks'].tolist()
lst += [0]*(8 - len(lst))
df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = pd.DataFrame(np.array(lst).reshape(1,8), index=df.index)

我已经使用np.array.reshape来确保没有形状不匹配错误。 如果我正确理解了您的问题,这可能会有所帮助。我确信必须有一个更酷的方法来做到这一点,其他人也许可以在这方面为您提供帮助,但这也可以完成任务。

答案 1 :(得分:2)

不创建新的DataFrame,而是将列表列转换为Series:

df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df['modelGreeks'].apply(pd.Series)

测试:

df = pd.DataFrame(columns=['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice','modelGreeks'])
df['modelGreeks'] = [[1,2,3,4,5,6,7,8], [1,2,None,4,5,6,7,8], [1,2,3,4,5,6,7], [None], None, [None,None,None,None,None]]
df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df['modelGreeks'].apply(pd.Series)

输出:

   IV_model   59  Price_model  ...   64  undPrice                     modelGreeks
0       1.0  2.0          3.0  ...  7.0       8.0        [1, 2, 3, 4, 5, 6, 7, 8]
1       1.0  2.0          NaN  ...  7.0       8.0     [1, 2, None, 4, 5, 6, 7, 8]
2       1.0  2.0          3.0  ...  7.0       NaN           [1, 2, 3, 4, 5, 6, 7]
3       NaN  NaN          NaN  ...  NaN       NaN                          [None]
4       NaN  NaN          NaN  ...  NaN       NaN                            None
5       NaN  NaN          NaN  ...  NaN       NaN  [None, None, None, None, None]

答案 2 :(得分:1)

您提出的第一个错误

  

ValueError:列的长度必须与键的长度相同

当您提供的值和列数不匹配时,将输出

例如,

import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)

df2[['team1','team2', 'team3']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)

在这种情况下,不需要'team3'。您可能已经知道的只是一点点的麻烦。

然后将列表中的最后一个条目替换为None:

d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],None]}
Produced error:
TypeError: object of type 'NoneType' has no len()

要删除“无”条目,只需执行以下操作:

df3 = df2.replace(to_replace='None', value=np.nan).dropna()

这应该不是问题,因为他们没有提供任何有价值的信息。

最后一个例子是:

import pandas as pd
import numpy as np

d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
                ['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],None]}

df2 = pd.DataFrame(d1)
df3 = df2.replace(to_replace='None', value=np.nan).dropna()

df2[['team1','team2']] = pd.DataFrame(df3.teams.values.tolist(), index= df3.index)
print (df2)

这将产生:

       teams team1 team2
0  [SF, NYG]    SF   NYG
1  [SF, NYG]    SF   NYG
2  [SF, NYG]    SF   NYG
3  [SF, NYG]    SF   NYG
4  [SF, NYG]    SF   NYG
5  [SF, NYG]    SF   NYG
6       None   NaN   NaN

希望这对您有用,如果您需要帮助将此示例应用于您,请告诉我。