Question

我有一个功能，希望可以根据输入将其应用于可变的列数。

def split_and_combine(row, *args, delimiter=';'):
    combined = []
    for a in args:
        if not row[a]:
            combined.extend(row[a].split(delimiter))

    combined = list(set(combined))
    return combined

但是由于* args，我不确定如何将此功能应用于df。我对python中的*args和*kwargs不太熟悉。我尝试如下使用partial and set axis = 1，但在下面得到TypeError。

df['combined'] = df.apply(partial(split_and_combine, ['col1','col2']),
                          axis=1)

TypeError: ('list indices must be integers or slices, not Series', 'occurred at index 0')

上述代码的虚拟示例。我希望能够传递灵活的列数以进行组合：

Index   col1        col2            combined
0      John;Mary    Sam;Bill;Eva    John;Mary;Sam;Bill;Eva
1      a;b;c        a;d;f           a;b;c;d;f

谢谢！如果没有df.apply可以更好地做到这一点。请随时发表评论！

Answer 1

df.apply文档

args：元组

除了数组/系列之外，还传递给func的位置参数。

** kwds

其他关键字参数作为关键字参数传递给func。

df.apply(split_and_combine, args=('col1', 'col2'), axis=1)

您可能在功能中遇到了一些错误：

def split_and_combine(row, *args, delimiter=';'):
    combined = []
    for a in args:
        if row[a]:
            combined.extend(row[a].split(delimiter))
    combined = list(set(combined))
    return delimiter.join(combined)

如何在df.apply（）中传递* args

1 个答案: