Question

我有一个包含许多列的数据集。我必须创建一个获取每列平均值并从列中每一行减去它的平均值的函数，然后将减去这些均值的结果返回该数据集。我在这里发现了一个类似的问题，并应用了答案，但我一直遇到错误。这是我的代码：

def exercise1(df):
    df1 = DataFrame(df)
    df2 = df1 - df1.mean()
    return df2

exercise1(data)

# Where data is the a csv file regarding salaries in the San Francisco area.

我遇到以下错误

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')

我不知道自己在做什么错

Answer 1

您可以使用try-except在列上进行for循环：

def exercise1(df):
    df1 = df.copy()
    for col in df1.columns:
        try:     # if we can compute the mean then substract
            df1[col] -= df1[col].mean()
        except:  # otherwise just ignore the column
            pass

    return df1

Answer 2

df.mean（）生成一个熊猫系列数据类型，该数据类型仅包含原始DataFrame中的数字列。

means = df.mean()

您可以使用以下方法获取该系列的索引值：

means.index

使用它来切片原始DataFrame并减去平均值

df2 = df[means.index] - means

Answer 3

您需要指定要从中减去的列：

df = {'values1': [1,2,3], 'values2': [4,5,6]}
def exercise1(df):
    df1 = pd.DataFrame(df)
    df2 = df1['values2'] - df1['values2'].mean()
    return df2

print(exercise1(df))

从列中减去每列的均值并返回

3 个答案: