Question

import pandas as pd

df = pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col']))


class Test:
    def __init__(self, data):
        self.data = data
        self.data.set_index('index', inplace = True)


test1 = Test(df)
test2 = Test(df)

print(test1.data)
print(test2.data)

这将引发错误：KeyError：“ ['index']都不在列中”

我意识到，在set_index()方法中将inplace = True与__init__一起使用不会操纵属于对象实例的self.data变量。实际上，它将data设置为所有实例共享的类变量。

当我避免使用inplace时，由于设置了对象实例的self.data变量，所以我没有收到错误消息。

import pandas as pd

df = pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col']))


class Test:
    def __init__(self, data):
        self.data = data
        self.data = self.data.set_index('index', inplace=False)


test1 = Test(df)
test2 = Test(df)

print(test1.data)
print(test2.data)

输出：

       col
index    
1      li
2      la
3      lu
       col
index    
1      li
2      la
3      lu

此行为的原因是什么？对我来说，在以.self

开头的变量上使用函数时设置类变量似乎有点违反直觉

使用inplace = True是否有理由或优势？

Answer 1

请不要第二次创建Test类的对象。一旦为test1对象设置了索引，在test2的数据框中就不再有“索引”列。只需将相同的代码修改为：

df = pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col']))


class Test:
    def __init__(self, data):
        self.data = data
        print(self.data)
        self.data.set_index('index', inplace = True)


test1 = Test(df)
print(test1.data)

Answer 2

我认为它与大熊猫无关，而更多的是Python是一种通过对象引用的语言（see explanations here）。

请考虑以下示例，该示例的行为与您的示例类似：

class Test2:
    def __init__(self, data):
        self.data = data
        self.data.append(2)

A=[0,1]
test1 = Test2(A)
print(A)

输出：

[0, 1, 2]

对基础对象A的修改得以保留（因为它是一个列表，并且列表是可变的，就像熊猫数据框一样。）

在您的示例中，当使用self.data.set_index('index', inplace = True)时，将不会创建新的数据框，与上面的示例类似，基础对象df被保留。

请考虑以下代码：

import pandas as pd

df = pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col']))

class Test:
    def __init__(self, data):
        self.data = data
        self.data.set_index('index', inplace = True)


print(df.columns)
test1 = Test(df)
print(df.columns)

输出：

Index(['index', 'col'], dtype='object')
Index(['col'], dtype='object')

df已更改。

最后，以下方法会起作用：

import pandas as pd

df = pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col']))

class Test:
    def __init__(self, data):
        self.data = data
        self.data.set_index('index', inplace = True)

test1 = Test(pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col'])))
test2 = Test(pd.DataFrame([[1, 'li'], [2, 'la'], [3, 'lu']], columns=(['index', 'col'])))

print(test1.data)
print(test2.data)

大熊猫靠自己。变数

2 个答案: