我整整一天都在努力理解以下代码,这对我来说意义不大。
有人可以启发我做什么吗?
某些上下文:
我们获得了
(user-id, item-id, rating)
形式的评分记录。 对于每条记录(正记录),我们正在尝试获取一个 极少(四个)否定项(在某种意义上,用户尚未评分)。
下面的代码应该是picking the negative items
,但是哇,很难遵循:(而且注释并没有太大帮助。.
最困惑的部分是self._total_negatives
和left_index = self.index_bounds[negative_users]
class BisectionDataConstructor(BaseDataConstructor):
"""Use bisection to index within positive examples.
This class tallies the number of negative items which appear before each
positive item for a user. This means that in order to select the ith negative
item for a user, it only needs to determine which two positive items bound
it at which point the item id for the ith negative is a simply algebraic
expression.
"""
def _index_segment(self, user):
lower, upper = self.index_bounds[user:user+2]
items = self._sorted_train_pos_items[lower:upper]
negatives_since_last_positive = np.concatenate(
[items[0][np.newaxis], items[1:] - items[:-1] - 1])
return np.cumsum(negatives_since_last_positive)
def construct_lookup_variables(self):
inner_bounds = np.argwhere(self._train_pos_users[1:] -
self._train_pos_users[:-1])[:, 0] + 1
(upper_bound,) = self._train_pos_users.shape
self.index_bounds = np.array([0] + inner_bounds.tolist() + [upper_bound])
# Later logic will assume that the users are in sequential ascending order.
assert np.array_equal(self._train_pos_users[self.index_bounds[:-1]],
np.arange(self._num_users))
self._sorted_train_pos_items = self._train_pos_items.copy()
for i in range(self._num_users):
lower, upper = self.index_bounds[i:i+2]
self._sorted_train_pos_items[lower:upper].sort()
self._total_negatives = np.concatenate([
self._index_segment(i) for i in range(self._num_users)])
def lookup_negative_items(self, negative_users, **kwargs):
output = np.zeros(shape=negative_users.shape, dtype=rconst.ITEM_DTYPE) - 1
left_index = self.index_bounds[negative_users]
right_index = self.index_bounds[negative_users + 1] - 1
num_positives = right_index - left_index + 1
num_negatives = self._num_items - num_positives
neg_item_choice = stat_utils.very_slightly_biased_randint(num_negatives)
use_shortcut = neg_item_choice >= self._total_negatives[right_index]
output[use_shortcut] = (
self._sorted_train_pos_items[right_index] + 1 +
(neg_item_choice - self._total_negatives[right_index])
)[use_shortcut]
if np.all(use_shortcut):
# The bisection code is ill-posed when there are no elements.
return output
来自https://github.com/tensorflow/models/blob/master/official/recommendation/data_pipeline.py
when train_pos_users = np.array(
[0,0, 1,1,1, 2,2,2, 3,3,3,3,3,3, 4,4])
self.index_bounds = array([ 0, 2, 5, 8, 14, 16])
如果您对它熟悉,并且在网上有关于它所进行操作的描述,那么我可以非常用它来理解它的作用。.我尝试使用Google搜索bisection negative sampling
,但是没有任何结果..
所以二等分意味着减半,这类似于二进制搜索。
我认为代码无法实现预期的功能,并留下了github问题。