Question

我的功能包括两列：

department（dtype = int32），但通常客户只有约200个部门（200个记录）
service（dtype = int32），但是每个部门只有约20个服务（20 * 200 = 4000条记录）

由于department和service主要是“稀疏”列，因此它们被定义为：

department_column = tf.feature_column.categorical_column_with_hash_bucket("department", 200, dtype=tf.int32)
feature_dict["department"] = tf.feature_column.indicator_column(department_column)

service_column = tf.feature_column.categorical_column_with_hash_bucket("service", 4000, dtype=tf.int32)
feature_dict["service"] = tf.feature_column.indicator_column(service_column)

但是，当我们将此功能定义为独立的列时，我们不会描述这些列之间的关系：service仅在特定的department范围内才有意义。

要描述这种关系，使用crossed columns似乎是个好主意。

解决方案包括两个阶段：

如何将这两列正确合并到crossed_column中？
要在估算器中使用crossed_column，我需要将其包装在embedding_column或indicator_column中。在这种情况下哪种类型更合适？

为相关的分类特征定义一个cross_column

0 个答案: