Question

有人可以给我看看tf.data.experimental.group_by_reducer的示例吗？我发现文档比较棘手，无法完全理解。

如何使用它来计算平均值？

Answer 1

假设我们提供了一个包含['ids', 'features']的数据集，并且我们想通过添加对应于同一'features'的{{1}}对数据进行分组。我们可以使用'ids'来实现。

原始数据

tf.group_by_reducer(key_func, reducer)

所需数据

ids | features
--------------
1   | 1
2   | 2.2
3   | 7
1   | 3.0
2   | 2
3   | 3

TensorFlow代码：

ids | features
--------------
1   | 4
2   | 4.2
3   | 10

考虑者ID ==1。我们使用import tensorflow as tf tf.enable_eager_execution() ids = [1, 2, 3, 1, 2, 3] features = [1, 2.2, 7, 3.0, 2, 3] # Define reducer # Reducer requires 3 functions - init_func, reduce_func, finalize_func. # init_func - to define initial value # reducer_func - operation to perform on values with same key # finalize_func - value to return in the end. def init_func(_): return 0.0 def reduce_func(state, value): return state + value['features'] def finalize_func(state): return state reducer = tf.contrib.data.Reducer(init_func, reduce_func, finalize_func) # Group by reducer # Group the data by id def key_f(row): return tf.to_int64(row['ids']) t = tf.contrib.data.group_by_reducer( key_func = key_f, reducer = reducer) ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features}) ds = ds.apply(t) ds = ds.batch(6) iterator = ds.make_one_shot_iterator() data = iterator.get_next() print(data)将初始值设置为0。 init_func将执行reducer_func和0 + 1操作，而1 + 3.0将返回4.0。

在group_by_reducer函数中，finalize_func是一个返回该数据行的键的函数。密钥应为Int64。在我们的例子中，我们使用“ ids”作为密钥。

Answer 2

我调整了@ Illuminati0x5B代码以与tf2.0一起使用。感谢@ Illuminati0x5B，您的示例代码确实很有帮助。

TensorFlow代码（已调整）：

ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]

# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func. 
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
    return 0.0

def reduce_func(state, value):
    return state + value['features']

def finalize_func(state):
    return state

reducer = tf.data.experimental.Reducer(init_func, reduce_func, finalize_func)

# Group by reducer
# Group the data by id
def key_f(row):
  return tf.dtypes.cast(row['ids'], tf.int64)

t = tf.data.experimental.group_by_reducer(
        key_func = key_f,
        reducer = reducer)

ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)

iterator = tf.compat.v1.data.make_one_shot_iterator(ds)
data = iterator.get_next()
print(data)

tf的示例。 group_by_reducer？

2 个答案: