Question

我正在尝试使用泰坦尼克号数据集使用keras来预测生存，但我不断收到错误消息。我无法弄清原因：（

> str(full_data)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1309 obs. of  13 variables:
 $ survived             : num  0 1 1 1 0 0 0 0 1 1 ...
 $ pclass               : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
 $ sex                  : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
 $ ticket               : Factor w/ 51 levels "A. 2. ","A./5. ",..: 6 23 45 21 21 21 21 21 21 21 ...
 $ cabin                : Factor w/ 9 levels "A","B","C","D",..: 8 3 8 3 8 8 5 8 8 8 ...
 $ embarked             : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
 $ fare_factor          : Factor w/ 5 levels "1","2","3","4",..: 1 2 1 4 1 1 3 3 3 3 ...
 $ age_interval         : Factor w/ 6 levels "1","2","3","4",..: 2 4 3 4 4 5 6 1 3 2 ...
 $ percent_survive_age  : Factor w/ 12 levels "0.0888888888888889",..: 2 11 10 11 3 5 1 7 10 8 ...
 $ title                : Factor w/ 6 levels "Lady","Miss",..: 3 4 2 4 3 3 3 5 4 4 ...
 $ percent_survive_title: Factor w/ 6 levels "0.156673114119923",..: 1 5 4 5 1 1 1 3 5 5 ...
 $ fam_size             : Factor w/ 9 levels "1","2","3","4",..: 2 2 1 2 1 1 1 5 3 2 ...
 $ fam_ID               : Factor w/ 176 levels "Abbott_3","Abelson_2",..: 19 40 109 61 109 109 109 130 90 123 ...
 - attr(*, "vars")= chr "fam_ID"
 - attr(*, "labels")='data.frame':  228 obs. of  1 variable:
  ..$ fam_ID: chr  "Abbott_3" "Abelson_2" "Ahlin_2" "Aks_2" ...
  ..- attr(*, "vars")= chr "fam_ID"
  ..- attr(*, "drop")= logi TRUE
 - attr(*, "indices")=List of 228
  ..$ : int  279 746 1283
  ..$ : int  308 874
  ..$ : int 40
  ..$ : int  855 1198
  ..$ : int  297 305 498 1197
  ..$ : int 192
  ..$ : int  13 68 119 541 542 610 813 850 1105
  ..$ : int 275
  ..$ : int  518 1081
  ..$ : int 571
  ..$ : int  49 353
  ..$ : int  25 182 233 261 1045 1065 1270
  ..$ : int  700 1093
  ..$ : int 206
  ..$ : int 85
  ..$ : int  448 469 644 858
  ..$ : int  362 702
  ..$ : int  118 299
  ..$ : int  543 546
  ..$ : int  183 618 1069 1217
  ..$ : int  248 871
  ..$ : int  291 484
  ..$ : int  140 852 971
  ..$ : int  188 593 657
  ..$ : int 356
  ..$ : int  0 477
  ..$ : int  670 684 1066 1247
  ..$ : int  728 1166
  ..$ : int  78 323 898
  ..$ : int  578 1257
  ..$ : int  679 1234
  ..$ : int  249 854
  ..$ : int  390 435 763 802
  ..$ : int  741 987
  ..$ : int  92 905
  ..$ : int  724 809
  ..$ : int  594 1010
  ..$ : int 166
  ..$ : int  580 1132
  ..$ : int  73 1006
  ..$ : int  1143 1163
  ..$ : int  426 1219
  ..$ : int  237 637 801
  ..$ : int  835 1070 1072
  ..$ : int 968
  ..$ : int  348 489 940
  ..$ : int  160 1016
  ..$ : int  540 745 1196
  ..$ : int  1 1125
  ..$ : int  423 616 1092
  ..$ : int 671
  ..$ : int 983
  ..$ : int  549 565 900 1078 1221
  ..$ : int  347 949
  ..$ : int  559 1151
  ..$ : int  93 788 923 1245
  ..$ : int  361 906
  ..$ : int  690 781
  ..$ : int  445 1184 1265
  ..$ : int  98 651
  ..$ : int  544 1130
  ..$ : int 1075
  ..$ : int  416 1085 1138
  ..$ : int  556 599
  ..$ : int  866 1111
  ..$ : int  981 1063
  ..$ : int 1041
  ..$ : int  352 532 1228
  ..$ : int 496
  ..$ : int  53 1168
  ..$ : int  86 147 436 736 1058
  ..$ : int  27 88 341 438 944 960
  ..$ : int  334 1295
  ..$ : int 660
  ..$ : int  587 1288
  ..$ : int 539
  ..$ : int  3 137
  ..$ : int  405 1292
  ..$ : int  1259 1293
  ..$ : int  861 1261
  ..$ : int  453 849
  ..$ : int  165 328 548
  ..$ : int  59 71 386 480 678 683 1030 1031
  ..$ : int  268 332
  ..$ : int  97 1241
  ..$ : int  104 392
  ..$ : int  451 490
  ..$ : int  142 403
  ..$ : int  247 755
  ..$ : int  704 1200
  ..$ : int 860
  ..$ : int  370 1255
  ..$ : int  52 645 720 848
  ..$ : int  62 230
  ..$ : int  314 440 535
  ..$ : int  820 1199
  ..$ : int  615 754 1244 1276
  ..$ : int  120 655 665
  ..$ : int 1129
  .. [list output truncated]
 - attr(*, "drop")= logi TRUE
 - attr(*, "group_sizes")= int  3 2 1 2 4 1 9 1 2 1 ...
 - attr(*, "biggest_group_size")= int 790
> full_data_dummy <- as.matrix(createDummyFeatures(full_data[1:891,]))
> str(full_data_dummy)
 num [1:891, 1:290] 0 1 1 1 0 0 0 0 1 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:891] "1" "2" "3" "4" ...
  ..$ : chr [1:290] "survived" "pclass.1" "pclass.2" "pclass.3" ...
> 
> ## Using keras for deeplearning classification
> 
> library(keras)
> ## defining a network with 2 hidden layers with 100 units, using a relu activation in order to initally overfit model
> model <- keras_model_sequential() %>% 
+   layer_dense(units = 16, activation = "relu", input_shape = c(289)) %>% 
+   layer_dense(units = 16, activation = "relu") %>% 
+   layer_dense(units = 1, activation = "sigmoid")
> 
> ## spliting train data into train and validation, in order to monitor how much model overfits the train data
> val_indices <- 1:300
> x_val <-  full_data_dummy[val_indices,-1]
> x_partial <- full_data_dummy[-val_indices,-1]
> y_val <- full_data_dummy[val_indices,1]
> y_partial <- full_data_dummy[-val_indices,1]
> 
> ## compiling model using binary crossenthropy and metrics accuracy
> model %>% compile(
+   optimizer = "rmsprop", 
+   loss = "binary_crossentropy", 
+   metrics = c("accuracy"))
> 
> ## fitting model on training dataset
> history <- model %>% fit(
+   x_partial, y_partial, epochs = 20, batch_size = 50, validation_data = c(x_val,y_val))
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

从示例代码中可以看到，我将所有因子变量都设置为虚拟变量，并将虚拟数据帧转换为矩阵。然后我拆分了火车和测试数据。我已经包括了完整数据帧和虚拟数据帧的结构。

py_call_impl中的错误ValueError：具有多个元素的数组的真值不明确。使用a.any（）或a.all（）。

0 个答案: