Question

我有一个正数值向量。我需要将其标准化，以使值的总和为1（例如概率）。这很简单，只需使用x_i / sum（x）作为权重。但这里需要注意的是：我需要的是，重量不会低于某些最小截止值，而且重量不会超过某些最大截止值。现在，这意味着两件事：首先，它意味着存在没有解决方案的情况（例如，如果最大截止值为0.2，则3个对象不能是权重）。其次，这意味着权重的“相对性”被打破了。也就是说，在标准规范化中（其中w_i是对于所有i给予x_i的权重），对于所有i，j，w_i / w_j = x_i / x_j。截止，这是不可能做到的。更正式我想找到一个函数w = rwc（x，min，max），其中x是一个向量，它返回一个具有以下属性的相同长度的向量：

1）sum（w）= 1

2）min＆lt; = w_i＆lt; = max for all i

3）如果x_i＆lt; = x_j则w_i＆lt; = w_j为所有i，j

4）如果w_i和w_j都与截止值（min和max）不同，那么它们保持相对性：即，如果min＆lt; w_i＆lt;最大值和最小值< w_j＆lt; max然后w_i / w_j = x_i / x_j

如果没有解决方案，则应返回NULL。

所以我的问题是：

a）你如何建议（用R或任何其他语言）？

b）给定x，可以有多个解决方案（即至少两个不同的向量，w和v，每个都符合上述形式要求）

这不是严格意义上的R问题，但是我在R中进行的项目中遇到了它，所以我将其发布为R.任何有关更好分类的建议都会受到欢迎。

更新

根据下面的讨论，经过更多的考虑之后，似乎有必要在上面的4中添加第五个要求： 5）在满足1-4的权重的所有可能分配中，W是具有最小极限数量（最小或最大）的那个。

这是我的代码（希望如此）：

#
mwc = function(x,mxw=1,mnw=0) {
cake = 1
agnts = 1:length(x)
result = numeric(length(x))
while(cake>0 & length(agnts)>0) {
    tmp = cake*x[agnts]/sum(x[agnts])
    low = which(tmp<mnw)
    high = which(tmp>mxw)
    if(length(low)+length(high)==0 ) {
        result[agnts] = tmp[agnts]
        break;
    }
    if (length(low)>0) {
        result[agnts[low]] = mnw
    }
    if (length(high)>0) {
        result[agnts[high]] = mxw
    }
    cake = 1-sum(result)
    agnts=agnts[-c(low,high)]
}
if (cake<0) return(NULL) #no solution
if (abs(sum(result)-1)>1e-17) return(NULL)
return(result)
}   
# the end

Answer 1

a）

我建议使用蛮力迭代算法。

让x' = x
计算sum(x')
计算截止限制min_x，max_x
从x'计算x，调整范围[min_x，max_x]
重复2-4直到x'稳定
计算w

在大多数情况下，迭代次数应该很少。

b）如果存在最小值或最大值（但不是两者），则解决方案向量是唯一的。

如果同时存在最小值和最大值，我不确定。感觉它应该是独一无二的，但我找不到一个简单的证明。

Answer 2

你是说这样的意思吗？这个例子是在Haskell中，＆＃34; []＆＃34;是一个空列表。

weights :: [Double] -> Double -> Double -> [Double]
weights my_vector min max = 
  let s = sum my_vector
      try = map (/s) my_vector
  in if all (>=min) try && all (<=max) try
        then try
        else []

输出：
*主＆GT;权重[1,2,3,4] 0 2
[0.1,0.2,0.3,0.4]
*主＆GT;重量[1,2,3,4] 1 2
[]

<强>更新
这是一个粗略的方向（Haskell再次），基于this：

import Data.List
import Data.Ord

normalize :: Double -> Double -> Double -> Double -> Double
normalize targetMin targetMax rawMax val =
  let maxReduce = 1 - targetMax/rawMax
      factor = maxReduce * (abs val) / rawMax
  in max ((1 - factor) * val) targetMin

weights :: [Double] -> Double -> Double -> [Double]
weights myVector targetMin targetMax = 
  let try = map (/(sum myVector)) myVector
  in if all (>=targetMin) try && all (<=targetMax) try
        then try
        else weights normalized targetMin targetMax
    where normalized = 
            let targetMax' = (maximum myVector * targetMin / minimum myVector)
            in map (\x -> normalize targetMin targetMax' (maximum myVector) x) myVector

输出：
*主＆GT;重量[4,4,4,1000] 0.1 0.7
[0.10782286784365082,0.10782286784365082,0.10782286784365082,0.6765313964690475]
*主＆GT;重量[1,1,1000000] 0.05 0.8
[0.12043818322274577,0.12043818322274577,0.7591236335545084]

Answer 3

这是我的第二个答案，我希望现在也可以解决要求4）。在我看来，如果要求4）要应用，那么我们必须将所有未指定为截止的元素除以：

    denominator = sum non_cutoff_elements / (1 - sum cutoff_elements)

其中'cutoff_elements'表示为其截止值。我希望这个递归代码试图耗尽截止分配的组合。代码似乎在他们的评论中解决了amit和rici的例子。哈斯克尔再次：

import Data.List
import Data.Ord

weights :: [Double] -> Double -> Double -> [[Double]]
weights myVector tMin tMax = 
  weights' 0
    where 
      weights' count
        | count == length myVector + 1 = []
        | otherwise =
            let new_list = take count myVector 
                           ++ replicate (length myVector - count) tMax
            in fromLeft new_list 0 ++ weights' (count+1)
                where 
                  fromLeft list' count' = 
                    let non_cutoffs = filter (\x -> x/=tMin && x/=tMax) list'
                        real_count = length list' - length non_cutoffs
                        cutoffs_l = filter (\x -> x==tMin) list'
                        cutoffs_r = filter (\x -> x==tMax) list'
                        denom = sum non_cutoffs / (1 - (sum $ cutoffs_l ++ cutoffs_r))
                        mapped = cutoffs_l ++ (map (/denom) non_cutoffs) ++ cutoffs_r
                        next_list = let left = tMin : cutoffs_l
                                        right = drop 1 cutoffs_r
                                    in left ++ non_cutoffs ++ right
                    in if count' == real_count
                          then []
                          else if sum cutoffs_r > 1 || sum cutoffs_l > 1 
                                  || sum cutoffs_r + sum cutoffs_l > 1
                                  then fromLeft next_list (count'+1)
                          else if sum mapped == 1 && all (>=tMin) mapped && all (<=tMax) mapped
                                  then mapped : fromLeft list' (count'+1)
                                  else fromLeft next_list (count'+1)

输出：
*主＆GT;重量[4,4,4,1000] 0.1 0.7
[[0.1,0.1,0.1,0.7]，[0.1,0.1,0.10000000000000009,0.7]，[0.1,0.10000000000000003,0.10000000000000003,0.7]，[0.10000000000000002,0.10000000000000002,0.10000000000000002,0.7]]
小数点后14位：[[0.1,0.1,0.1,0.7]，[0.1,0.1,0.1,0.7]，[0.1,0.1,0.1,0.7]，[0.1,0.1,0.1,0.7]]
*主＆GT;重量[1,1,1000000] 0.05 0.8
[[5.0E-2,0.1499999999999999,0.8]，[9.999999999999998e-2,9.999999999999998e-2,0.8]
舍入到14位小数：[[0.05,0.15,0.8]，[0.1,0.1,0.8]]

截止的相对权重

3 个答案: