将新列添加到具有可变长度的数据框

时间:2018-07-16 09:22:46

标签: python python-3.x pandas dataframe series

我有一个关于将结果添加到现有数据框的问题。

<script type="text/javascript"> 
function closePrint () {   
document.body.removeChild(this.__container__); } 
function setPrint () {  
this.contentWindow.__container__ = this;   
this.contentWindow.onbeforeunload = closePrint;   
this.contentWindow.onafterprint = closePrint;   
this.contentWindow.focus(); 
this.contentWindow.print(); }

function printPage (sURL) {   
var oHiddFrame = document.createElement("iframe");   
oHiddFrame.onload = setPrint;   
oHiddFrame.style.visibility = "hidden";   
oHiddFrame.style.position = "fixed";   
oHiddFrame.style.right = "0";   
oHiddFrame.style.bottom = "0";   
oHiddFrame.src = sURL;   
document.body.appendChild(oHiddFrame); }
        </script>

将新列表追加到现有数据框时,出现错误消息“ ValueError:值的长度与索引的长度不匹配”,因此我想知道如何将新列表添加到新列并填充所有在保持原始顺序的同时缺少带有“无”的行值?

过滤前的原始数据(约700行):

if relevant_item != 'None' and relevant_item != 'Not in dict':
    items = relevant_item
    len_item = len(items)

    if len_item == 1:
        item_result = items

    if len_item == 2:
        two = items
        item_result = some_method(two)

    if len_item == 3:
        threes = items
        item_result = some_method(three)

hash_in_dict_shopping.append(item_result)#new list of list

shops = pd.Series(hash_in_dict_shopping)
df_final['hash_in_shop'] = shops.values

过滤掉相关项目(大约40行)的数据后:

'None'
'Not in dict'
['apple','banana', 'grapes']
'None'
'Not in dict'
'Not in dict'
['pasta', 'rice', 'lentils']
'None'
'None'
['milk']

应用some_method后(从字典返回一个值):

 ['apple','banana', 'grapes']
 ['pasta', 'rice', 'lentils']
 ['milk']

数据框中所有700行的新列:

['fruit','green groceries']
['dry food', 'staples', 'legumes']
['dairy']

2 个答案:

答案 0 :(得分:2)

有2点需要注意:

  1. 迭代系列时,您不应忽略/跳过“无” /“不在字典中”行。新系列的长度必须与原始系列的长度相同。
  2. 您应该使用内置的Pandas功能逐行应用功能。由于您不能使用矢量化功能(因为数据框包含list个对象),因此可以将pd.Series.apply与自定义功能一起使用。

这是一个最小的例子:

df = pd.DataFrame({'col': ['None', 'Not in dict', ['apple', 'banana', 'grapes'],
                           'None', ['mile'], 'Not in dict']})

def calculated(x):
    try:
        if x in {'Not in dict', 'None'}:
            return None
    except TypeError:
        if len(x) == 1:
            return 2
        elif len(x) == 2:
            return 4
        else:
            return 6

df['calc'] = df['col'].apply(calculated)

print(df)

                       col  calc
0                     None   NaN
1              Not in dict   NaN
2  [apple, banana, grapes]   6.0
3                     None   NaN
4                   [mile]   2.0
5              Not in dict   NaN

答案 1 :(得分:0)

您是否尝试设置一个空数组,然后更改它的值?

import numpy as np
items = numpy.empty((len(DataFrame))
items[:] = numpy.nan

if relevant_item != 'None' and relevant_item != 'Not in dict':
items[i] = relevant_item # supposing you have some so
len_item = count_nonzero(np.isnan(items))

if len_item == 1:
    item_result = items

if len_item == 2:
    two = items
    item_result = some_method(two)

if len_item == 3:
    threes = items
    item_result = some_method(three)

这样,您的items数组具有与数据框相同的长度,并且不会收到该错误。如果NaN数组不合适,为什么不尝试numpy.zeros?

希望这会有所帮助!