Question

我正在尝试使用pandas DataFrame.combine来组合多个数据框。但是，我无法弄清楚如何实现func参数。该文件对我来说不是很清楚。文档指定：

DataFrame.combine(other, func, fill_value=None, overwrite=True)
other : DataFrame
func : function. Function that takes two series as inputs and return a Series or a scalar
fill_value : scalar value
overwrite : boolean, default True. If True then overwrite values for common keys in the calling frame

经过一些研究，我发现类似的命令DataFrame.combine_first可以与reduce一起用于组合多个数据框（link）：

reduce(lambda left,right: pd.DataFrame.combine_first(left,right), [pd.read_csv(f) for f in files])

如何使用DataFrame.combine组合多个数据框？

Answer 1

根据文档，您可以使用Dataframe.combine 添加两个DataFrame对象，而不传播NaN值。 如果对于（列，时间）一帧缺少一个值，它将默认为另一帧的值（也可能是NaN）。

func是一个函数，您可以编写逻辑来选择值。我认为你因为lambda表达而感到困惑。让我在不使用lambda表达式的情况下重写文档中给出的示例。

def _fn(left, right):
    if left.sum() < right.sum():
        return left
    else
        return right

df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
df1.combine(df2, _fn)

输出

P.S：由于OP希望使用Dataframe.combine来复制Dataframe.combine_first的行为，我将从pandas github存储库粘贴Dataframe.combine_first的源代码。 https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L4153

def combine_first(self, other):
    import pandas.core.computation.expressions as expressions

    def combiner(x, y, needs_i8_conversion=False):
        x_values = x.values if hasattr(x, 'values') else x
        y_values = y.values if hasattr(y, 'values') else y
        if needs_i8_conversion:
            mask = isna(x)
            x_values = x_values.view('i8')
            y_values = y_values.view('i8')
        else:
            mask = isna(x_values)

        return expressions.where(mask, y_values, x_values)

    return self.combine(other, combiner, overwrite=False)

DataFrame.combine结合多帧

1 个答案: