Question

我目前正在尝试建立模型，以预测人们将在我的子集中收到哪些[奖励]。

我收到“奖励”的关键错误，但不确定为什么。

这是我的代码（第2行中的错误）：

subset = pd.get_dummies(subset) #one-hot encoding
labels = np.array(subset['award']) #Labels= value to predict
subset= subset.drop('award', axis = 1) #remove labesl from subset, axis 1=columns
subset_list = list(subset.columns) #save subset names for later use
subset = np.array(subset)# Convert to numpy array

[奖项]通常包含：最佳导演，最佳演员等。

子集中一行的示例是：

          birthplace         DOB         race    award
Id        
670454353 Chisinau, Moldova  30/09/1895  White   Best Director

在pd.get_dummies列之前->

Index(['birthplace', 'date_of_birth', 'race_ethnicity', 'year_of_award',
   'award', 'ldob', 'year', 'award_age', 'country', 'bin'],
  dtype='object')

在pd.get_dummies（subset）->之后

Index(['year_of_award', 'ldob', 'year', 'award_age',
   'birthplace_Arlington, Va, US', 'birthplace_Astoria, Ny, US',
   'birthplace_Athens, Ga, US', 'birthplace_Athens, Greece',
   'birthplace_Atlanta, Ga, US', 'birthplace_Baldwin, Ny, US',
   ...
   'country_ Turkey', 'country_ US', 'country_ Ukraine', 'country_ Wales',
   'bin_0-25', 'bin_25-35', 'bin_35-45', 'bin_45-55', 'bin_55-65',
   'bin_65-75'],

输入：

 check_cols = [col for col in subset.columns if 'award' in col]

输出：

['year_of_award', 'award_age', 'award_Best Actor', 'award_Best Actress', 
 'award_Best Director', 'award_Best Supporting Actor', 'award_Best 
 Supporting Actress']

如果我尝试引用上述任何一项代替奖励，则会出现相同的错误。

Answer 1

KeyError表示密钥award在subset中不存在。您将要检查子集的结构，以便正确访问它。现在，那里没有元素award。

如果您提供更多有关subset的构建方式的代码，我也许可以提供进一步的帮助。

无法识别模型构建密钥

1 个答案: