我正在尝试运行yhat in their article about random forests in Python提供的代码,但我不断收到以下错误消息:
File "test_iris_with_rf.py", line 11, in <module>
df['species'] = pd.Factor(iris.target, iris.target_names)
AttributeError: 'module' object has no attribute 'Factor'
代码:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
print df
print iris.target_names
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Factor(iris.target, iris.target_names)
df.head()
答案 0 :(得分:46)
在较新版本的pandas中,Factor
会被称为Categorical
。将您的行更改为:
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
答案 1 :(得分:8)
分类变量似乎是大熊猫中更活跃的发展领域之一,所以我相信它在熊猫0.15.0中再次发生变化:
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
(我没有足够的声誉将其添加为对David Robinson的回答的评论)
答案 2 :(得分:0)
def factor(series):
#input should be a pandas series object
dic = {}
for i,val in enumerate(series.value_counts().index):
dic[val] = i
return [ dic[val] for val in series.values ]