pandas属性错误:找不到属性'Factor'

时间:2014-02-10 22:42:28

标签: python python-2.7 pandas

我正在尝试运行yhat in their article about random forests in Python提供的代码,但我不断收到以下错误消息:

File "test_iris_with_rf.py", line 11, in <module>
    df['species'] = pd.Factor(iris.target, iris.target_names)
AttributeError: 'module' object has no attribute 'Factor'

代码:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
print df
print iris.target_names
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75

df['species'] = pd.Factor(iris.target, iris.target_names)

df.head()

3 个答案:

答案 0 :(得分:46)

在较新版本的pandas中,Factor会被称为Categorical。将您的行更改为:

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

答案 1 :(得分:8)

分类变量似乎是大熊猫中更活跃的发展领域之一,所以我相信它在熊猫0.15.0中再次发生变化:

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

(我没有足够的声誉将其添加为对David Robinson的回答的评论)

答案 2 :(得分:0)

def factor(series):
    #input should be a pandas series object
    dic = {}
    for i,val in enumerate(series.value_counts().index):
        dic[val] = i
    return [ dic[val] for val in series.values ]