Question

我的Stata数据集包含对模拟中不同玩家创建的产品的构成组件的观察。我想只保留由不同且独特的组件组成的产品（由每个玩家创建）。因此，我尝试开发一种算法，该算法将基于组内的成对值比较顺序地消除产品。这是数据：

+---------------+----------+--------+-----------+----+----+----+----+---------------+
| simulation_id | prodsrt  | player | dupl_prod | n8 | n6 | n4 | n2 | product_value |
+---------------+----------+--------+-----------+----+----+----+----+---------------+
|             1 | 04091520 |      1 |         0 |  4 |  9 | 15 | 20 | 3,498         |
|             1 | 02081821 |      1 |         0 |  2 |  8 | 18 | 21 | 2,457         |
|             1 | 06101424 |      2 |         0 |  6 | 10 | 14 | 24 | 3,686         |
|             1 | 03071719 |      2 |         9 |  3 |  7 | 17 | 19 | 2,025         |
|             1 | 05111323 |      2 |         7 |  5 | 11 | 13 | 23 | 2,509         |
|             1 | 03121619 |      2 |         2 |  3 | 12 | 16 | 19 | 2,544         |
|             1 | 01111319 |      2 |         4 |  1 | 11 | 13 | 19 | 2,791         |
|             1 | 05071723 |      2 |         5 |  5 |  7 | 17 | 23 | 2,509         |
+---------------+----------+--------+-----------+----+----+----+----+---------------+

本案的最终结果如下：

+---------------+----------+--------+-----------+----+----+----+----+---------------+
| simulation_id | prodsrt  | player | dupl_prod | n8 | n6 | n4 | n2 | product_value |
+---------------+----------+--------+-----------+----+----+----+----+---------------+
|             1 | 04091520 |      1 |         0 |  4 |  9 | 15 | 20 | 3,498         |
|             1 | 02081821 |      1 |         0 |  2 |  8 | 18 | 21 | 2,457         |
|             1 | 06101424 |      2 |         0 |  6 | 10 | 14 | 24 | 3,686         |
|             1 | 01111319 |      2 |         4 |  1 | 11 | 13 | 19 | 2,791         |
|             1 | 05071723 |      2 |         5 |  5 |  7 | 17 | 23 | 2,509         |
+---------------+----------+--------+-----------+----+----+----+----+---------------+

这个想法是： 1）按每个玩家的值排列所有“有问题”（对于哪个dupl_prod！= 0）产品 2）选择具有最大值product_value和第二最佳值的产品 3）对于这对产品，检查两者之间是否存在重叠组件：一个。如果存在重叠组件，请丢弃具有较低值的产品湾如果没有重叠的组件，请保留两个产品，但从“有问题”集中排除价值较低的产品 4）重新排列剩余产品并重复相同的程序，直到只剩下不重叠的产品为止

由于要扫描的“有问题”产品的数量因播放器和模拟而异，我需要运行此例程，需要成对比较的最大次数。

当前版本的代码如下所示：

 

       forval x = 1/2 {



bysort simulation_id player (product_value):  gen rank = sum(product_value != product_value[_n-1]) if dupl_prod!=0 
bysort simulation_id player (rank):  egen maxrank= max(rank) if dupl_prod!=0 
gen active = cond(missing(rank),.,cond(rank==maxrank | rank==maxrank-1,1,.)) 
gen dupl_cell =. 

local i=2
while `i'<15 { /* max number of digits in productid is equal to 14*/
sort simulation_id  player active n`i'
quietly by simulation_id  player active n`i':  gen dupl_n`i' = cond(missing(n`i'), ., cond(_N==1,0,_n)) 
egen temp  = rowtotal(dupl_n*) if `i'==14 
replace dupl_cell = temp
drop temp
local i=`i'+2
}


gen test = cond(missing(dupl_cell),.,cond(rank!=maxrank,1,0)) if dupl_cell!=0 
drop if test==1 
replace dupl_prod = 0 if dupl_prod!=0 & dupl_cell==0 & rank!=maxrank & active==1 

drop test rank maxrank active dupl_cell  drop dupl_n*
}

它不会产生任何错误，但都不会产生所需的结果，因为它只保留了最初的第一个最佳产品。此外，即使重复次数设置为2，它仍然会产生相同的结果，尽管为此会发生超过2次“forval”循环的迭代。

Stata：基于组内成对比较顺序消除观察结果

0 个答案: