Question

我能够在综合数据上重现错误：

import pandas as pd
from datetime import datetime

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': [datetime.now(), datetime.now(), datetime.now(), datetime.now()],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3]);
df2 = pd.DataFrame({'A': ['A1', 'A2', 'A3', 'A4'],
                    'E': ['E1', 'E2', 'E3', 'E4']},
                   index=[0,1,2,3]);

df = pd.merge(df1, df2, how='left', on=['A', 'A']);

def getList(row):
    r = [];
    if row["A"] == "A1": r.append("test-01");
    if row["B"] == "B1": r.append("test-02");
    if row["B"] == "B2": r.append("test-03");
    return r;

df["NEW_COLUMN"] = df.apply(lambda row: getList(row), axis = 1);

原始帖子：我想基于多种条件在pandas数据框中创建一个新列。新列的值应为list。但是我收到“ ValueError：指定索引传递的空数据。”如果列表为空。

def getList(p_row):
  r = [];
  if p_row["field1"] > 0: r.append("x");
  ...
  return r;

df["new_list_field"] = df.apply(lambda row: getList(row), axis = 1);

完整错误：

ValueError跟踪（最近一次通话最近）   C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py在   create_block_manager_from_arrays（数组，名称，轴）4636尝试：   -> 4637个块= form_blocks（数组，名称，轴）4638 mgr = BlockManager（块，轴）

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py在   如果len（object_items）> 0，则form_blocks（数组，名称，轴）4728：   -> 4729个对象块= _simple_blockify（object_items，np.object_）4730个块.extend（object_blocks）

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py在   _simple_blockify（tuples，dtype）4758“”“   -> 4759个值，位置= _stack_arrays（tuples，dtype）4760

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py在   _stack_arrays（tuples，dtype）4822 for i，枚举（arrays）中的arr：   -> 4823 Stacked [i] = _asarray_compat（arr）4824

ValueError：无法将输入数组从形状（2）广播到形状中   （195）

在处理上述异常期间，发生了另一个异常：

ValueError跟踪（最近一次通话最近）    在（）中   ----> 1 df [“ new_list_field”] = df.apply（lambda row：getList（row），axis = 1）;

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在   套用（自我，功能，轴，广播，原始，减少，args，** kwds）4875   f，轴，4876减小=减小，   -> 4877 ignore_failures = ignore_failures）4878否则：4879返回   self._apply_broadcast（f，axis）

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在   _apply_standard（自我，函数，轴，ignore_failures，减少）4988索引=无4989   -> 4990结果= self._constructor（数据=结果，索引=索引）4991结果。列= res_index 4992

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在    init （自身，数据，索引，列，dtype，副本）       328 dtype = dtype，copy = copy）       329 elif isinstance（数据，字典）：   -> 330 mgr = self._init_dict（数据，索引，列，dtype = dtype）       331 elif isinstance（data，ma.MaskedArray）：       332将numpy.ma.mrecords导入为mrecords

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在   _init_dict（自身，数据，索引，列，dtype）       459个数组= [键中k个的data [k]]       460   -> 461 return _arrays_to_mgr（数组，数据名称，索引，列，dtype = dtype）       462       463 def _init_ndarray（自身，值，索引，列，dtype = None，copy = False）：

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py在   _arrays_to_mgr（数组，arr_名称，索引，列，dtype）6171轴= [_ensure_index（列），_ensure_index（索引）] 6172   -> 6173返回create_block_manager_from_arrays（arrays，arr_names，axes）6174 6175

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py在   create_block_manager_from_arrays（数组，名称，轴）4640
  返回mgr 4641，除了ValueError作为e：   -> 4642 construction_error（len（arrays），arrays [0] .shape，axes，e）4643 4644

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py在   construction_error（tot_items，block_shape，axes，e）4604
  如果block_shape [0] == 0，则引发e 4605：   -> 4606提高ValueError（“通过指定索引传递空数据。”）

4607提高ValueError（“传递的值的形状为{0}，索引   暗含{1}“。format（已通过4608，暗含））

ValueError：带有指定索引的空数据传递。

Answer 1

此函数的输出长度因行而异，但是您不能将不等长的列表分配给新的pandas列。您可以通过以下方式进行验证：

for idx,row in df.iterrows():
    print(getList(row))

另一种方法是将输出转换为字符串：

def getListString(row):
    r = ''
    if row["A"] == "A1": r+="test-01"
    if row["B"] == "B1": r+="test-02"
    if row["B"] == "B2": r+="test-03"
    return r

Answer 2

最终制作了一个列表列表，将其变成pd.Series()并将其分配给新列。字典key2list返回可变长度列表作为值：

new_col_list = []

for _, row in my_df.iterrows():
    new_col_list.append(key2list[row[u'key']])

my_df[u'new_col'] = pd.Series(new_col_list)

在python中，使用lambda将空列表添加到dataframe列会引发valueError

2 个答案: