Question

我在pandas中有一个数据框，其中包含几个看起来相似的列（具有不同的名称）。我正在尝试编写一个比较两列数据的函数，如果它们相同则删除第二个函数。我试过这个：

import numpy as np
import pandas as pd

def drop_if_ident(df, col1, col2):
    # Drops second column if columns contain identical data
    if (df.shape[0] == np.sum(pd.notnull(df.col1) == pd.notnull(df.col2)):
        df.drop(
            col2,
            axis=1,
            inplace=True
        )

# Usage
drop_if_ident(my_dataframe, my_first_column, my_second_column)

iPython引发以下错误：

File "<ipython-input-109-e11b622181bb>", line 3
if (df.shape[0] == np.sum(pd.notnull(df.col1) == pd.notnull(df.col2)):
                                                                     ^
SyntaxError: invalid syntax

...但这里的语法是什么？为noob问题道歉：）

用于删除pandas数据帧中的重复列的函数

0 个答案: