使用vlookup在另一个变量

时间:2018-02-15 23:38:06

标签: stata lookup

1)应为变量sku中列出的每个唯一观察创建一个新变量,其中包含重复值。

2)只要观察'price值在同一子类别中sku,就应该在商店/周级别为这些新创建的变量分配自有产品subc的值。 })作为变量本身。例如,在第{3,4}和第5行中的eta2,3,观察值具有相同的值,因为它们都与sku#3属于相同的子类别。 [eta2,3表示sku 3,副2。]

3)x表示这是当前正在复制的产品/子类别的原始值。

4)如果观察不属于同一子类别,则应反映“0”。

Orange是给定的数据。绿色是步骤1,2和3中的值。白色单元格是步骤4.

enter image description here

我无法提供自己的解决方案,因为正在搜索 使用现有观察生成变量的方法并没有给我带来结果。

我也理解它必须是forvaluesforeachlevelsof命令的组合?

clear
input units price   sku week    store   subc
3   4.3 1   1   1   1
2   3   2   1   1   1
1   2.5 3   1   1   2
4   12  5   1   1   2
5   12  6   1   1   3
35  4.3 1   1   2   1
23  3   2   1   2   1
12  2.5 3   1   2   2
35  12  5   1   2   2
35  12  6   1   2   3   
3   20  1   2   1   1
2   30  2   2   1   1
4   40  3   2   2   2
1   50  4   2   2   2
9   10  5   2   2   2
2   90  6   2   2   3
end

更新 根据Nick Cox的反馈,这是给出我一直在寻找的结果的最终代码:

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

egen joint = group(subc sku), label 

bysort store week : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

tostring subc sku, replace
gen new = subc + "_"+sku 


su joint, meanonly 
forval j = 1/`r(max)'{     
 local J = new[`j'] 
    gen eta`J' = . 
} 

sort  subc week store sku 
egen joint1 = group(subc week store), label 

gen long id = _n 
su joint1, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint1 == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   forval j = `jmin'/`jmax' {  
   local subc = subc[`j'] 
   local sku = sku[`j'] 
   replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax' 
   replace eta`subc'_`sku' = 0 in `j'/`j'  
   }
}    

3 个答案:

答案 0 :(得分:1)

我代表您担心,在您要求的任何大小的数据集中,将意味着许多额外的变量。我想知道你是否需要他们所有的任何方式,无论你想做什么。

除此之外,这似乎是你想要的。当然,电子表格视图中的列标题不是合法的变量名称。披露:尽管我是levelsof的原始作者,但我不会在这里使用它。

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
end

sort subc sku 
* subc identifiers guaranteed to be integers 1 up 
egen subc_id = group(subc), label 

* observation numbers in a variable  
gen long id = _n 

* how many subc? loop over the range 
su subc_id, meanonly 
forval i = 1/`r(max)' { 

   * which subc is this one? look it up using -summarize-
   * assuming that subc is numeric!    
   su subc if subc_id == `i', meanonly  
   local I = r(min) 

   * which observation numbers for this subc? 
   * given the prior sort, they are all contiguous 
   su id if subc_id == `i', meanonly 

   * for each observation in the subc, find out the sku and copy its price 
   * to all observations in that subc  
   forval j = `r(min)'/`r(max)' { 
       local J = sku[`j'] 
       gen eta_`I'_`J' = cond(subc_id == `i', price[`j'], 0) 
   }
}    

list subc eta*, sepby(subc)

     +------------------------------------------------------------------+
     | subc   eta_1_1   eta_1_2   eta_2_3   eta_2_4   eta_2_5   eta_3_6 |
     |------------------------------------------------------------------|
  1. |    1       4.3         3         0         0         0         0 |
  2. |    1       4.3         3         0         0         0         0 |
     |------------------------------------------------------------------|
  3. |    2         0         0       2.5         1        12         0 |
  4. |    2         0         0       2.5         1        12         0 |
  5. |    2         0         0       2.5         1        12         0 |
     |------------------------------------------------------------------|
  6. |    3         0         0         0         0         0        12 |
     +------------------------------------------------------------------+

注意:

N1。在您的示例中,subc编号为1,2等。我的额外变量subc_id确保即使在您的真实数据中标识符不那么干净也是如此。

N2。表达式

cond(subc_id == `i', price[`j'], 0)

也可能是

(subc_id == `i') * price[`j'] 

N3。似乎有可能不同的数据结构会更有效率。

编辑:这是另一个数据结构的代码和结果。

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
end

sort subc sku 
egen subc_id = group(subc), label 

bysort subc : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su subc_id, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if subc_id == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

   list subc sku *1 *2 *3 , sepby(subc)

     +------------------------------------------------------------+
     | subc   sku   eta1   which1   eta2   which2   eta3   which3 |
     |------------------------------------------------------------|
  1. |    1     1    4.3        1      3        2      .        . |
  2. |    1     2    4.3        1      3        2      .        . |
     |------------------------------------------------------------|
  3. |    2     3    2.5        3      1        4     12        5 |
  4. |    2     4    2.5        3      1        4     12        5 |
  5. |    2     5    2.5        3      1        4     12        5 |
     |------------------------------------------------------------|
  6. |    3     6     12        6      .        .      .        . |
     +------------------------------------------------------------+

答案 1 :(得分:1)

我正在添加另一个解决subcweek组合的答案。之前的讨论确定了您要做的事情为每次观察添加额外的变量。这不是个好主意!充其量,您可能只有许多新变量,大多数为零。在最坏的情况下,你会遇到Stata的限制。

因此,我不会支持你在同一条道路上走得更远的努力,但是要说明我在之前的答案中讨论的第二个数据结构是如何产生的。实际上,您没有指出(a)为什么您想要所有这些变量,这些变量只是现有数据的重新分配; (b)你的策略是如何处理它们的; (c)为什么rangestat(SSC)或其他一些程序无法首先消除创建它们的需要。

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

sort subc week sku 
egen joint = group(subc week), label 

bysort joint : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su joint, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

list subc week sku *1 *2 *3 , sepby(subc week)

     +-------------------------------------------------------------------+
     | subc   week   sku   eta1   which1   eta2   which2   eta3   which3 |
     |-------------------------------------------------------------------|
  1. |    1      1     1    4.3        1      3        2      .        . |
  2. |    1      1     2    4.3        1      3        2      .        . |
     |-------------------------------------------------------------------|
  3. |    1      2     1    5.3        1      4        2      .        . |
  4. |    1      2     2    5.3        1      4        2      .        . |
     |-------------------------------------------------------------------|
  5. |    2      1     3    2.5        3      1        4     12        5 |
  6. |    2      1     4    2.5        3      1        4     12        5 |
  7. |    2      1     5    2.5        3      1        4     12        5 |
     |-------------------------------------------------------------------|
  8. |    2      2     3    3.5        3      2        4     13        5 |
  9. |    2      2     4    3.5        3      2        4     13        5 |
 10. |    2      2     5    3.5        3      2        4     13        5 |
     |-------------------------------------------------------------------|
 11. |    3      1     6     12        6      .        .      .        . |
     |-------------------------------------------------------------------|
 12. |    3      2     6     13        6      .        .      .        . |
     +-------------------------------------------------------------------+

答案 2 :(得分:0)

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

egen joint = group(subc sku), label 

bysort store week : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

tostring subc sku, replace
gen new = subc + "_"+sku 


su joint, meanonly 
forval j = 1/`r(max)'{     
 local J = new[`j'] 
    gen eta`J' = . 
} 

sort  subc week store sku 
egen joint1 = group(subc week store), label 

gen long id = _n 
su joint1, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint1 == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   forval j = `jmin'/`jmax' {  
   local subc = subc[`j'] 
   local sku = sku[`j'] 
   replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax' 
   replace eta`subc'_`sku' = 0 in `j'/`j'  
   }
}    

 list subc sku store week eta*, sepby(subc)


   +---------------------------------------------------------------------------------+
     | store   week   subc   sku   eta1_1   eta1_2   eta2_3   eta2_4   eta2_5   eta3_6 |
     |---------------------------------------------------------------------------------|
  1. |     1      1      1     2      4.3        0        .        .        .        . |
  2. |     1      1      1     1        0        3        .        .        .        . |
     |---------------------------------------------------------------------------------|
  3. |     1      1      2     4        .        .      2.5        0       12        . |
  4. |     1      1      2     3        .        .        0        1       12        . |
  5. |     1      1      2     5        .        .      2.5        1        0        . |
     |---------------------------------------------------------------------------------|
  6. |     1      1      3     6        .        .        .        .        .        0 |
     |---------------------------------------------------------------------------------|
  7. |     1      2      1     2      5.3        0        .        .        .        . |
  8. |     1      2      1     1        0        4        .        .        .        . |
     |---------------------------------------------------------------------------------|
  9. |     1      2      2     3        .        .        0        2       13        . |
 10. |     1      2      2     5        .        .      3.5        2        0        . |
 11. |     1      2      2     4        .        .      3.5        0       13        . |
     |---------------------------------------------------------------------------------|
 12. |     1      2      3     6        .        .        .        .        .        0 |
     +---------------------------------------------------------------------------------+