我在数组中有3个字符串。我还有一个数据帧,其值为0-2。我想添加一个新列,其结果是评估列表[x]。
到目前为止,我有以下代码:
import pandas as pd
import numpy as np
from sklearn import datasets
iris_raw = datasets.load_iris()
iris = pd.DataFrame(iris_raw.data, columns=iris_raw.feature_names)
iris = pd.concat([iris, pd.DataFrame(iris_raw.target)], axis = 1)
vals = iris_raw.target_names
def eval_dummy(tgt_dum):
default = np.nan
return(iris_raw.target_names[tgt_dum] if 0 <= tgt_dum <= len(vals) else default)
vec_eval_dumm = np.vectorize(eval_dummy)
iris = pd.concat([iris, pd.DataFrame(vec_eval_dumm(np.array(iris.iloc[:, 4])))], axis = 1)
iris.columns.values[5] = 'species'
print(iris.head())
这远不是很好,有没有更好的方法呢?
答案 0 :(得分:1)
您希望 map整数到目标名称?
# NumPy rather than pandas concatenation might be a bit quicker
iris = np.concatenate((iris_raw.data, iris_raw.target[:, None]), axis=1)
iris = pd.DataFrame(iris, columns=iris_raw.feature_names + ['tgt_num'])
mapped = dict(zip([0, 1, 2], iris_raw.target_names))
iris.loc[:, 'species'] = iris.tgt_num.map(mapped)
print(iris)
# sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
# 0 5.1 3.5 1.4 0.2
# 1 4.9 3.0 1.4 0.2
# 2 4.7 3.2 1.3 0.2
# 3 4.6 3.1 1.5 0.2
# 4 5.0 3.6 1.4 0.2
#
# tgt_num species
# 0 0.0 setosa
# 1 0.0 setosa
# 2 0.0 setosa
# 3 0.0 setosa
# 4 0.0 setosa