如果var2中存在var1中的值,则创建一个新变量

时间:2018-10-10 03:22:50

标签: stata

假设我有一个list_a变量,其中包含世界上所有可能进行的运动:

football 
tennis 
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer 

还要假设我还有另一个变量list_b,它只有三个运动:

cricket
hockey
swimming

我想创建一个新变量Cont,当在1中找到list_a中的运动项时,它等于list_b,并且等于0当这项运动不在list_b中时。

变量Cont如下所示:

0
0
1
1
0
0
1
0
0
0
1 
1
0

将进行以下工作:

gen Cont = 0
replace Cont = 1 if  (strmatch( list_a, ( list_b)))

编辑:

假设list_a也包含hoccckey(这是一个错字),但我仍然希望将其计算在内。

有办法吗?

2 个答案:

答案 0 :(得分:2)

答案是否定的,因为您的方法将比较每个观察值中两个变量的值。相反,您需要将list_a每行的值与变量list_b的所有值进行比较。

使用玩具示例:

clear

input strL(list_a list_b)
football cricket
tennis hockey 
hockey swimming 
cricket 
croquet 
racquetball 
cricket
pingpong
squash
rugby
swimming 
swimming
soccer
end

以下说明了这种哲学:

local obs = _N
generate Cont = 0

forvalues i = 1 / `obs' {
    forvalues j = 1 / `obs' {
        replace Cont = 1 if list_a[`i'] == list_b[`j'] in `i'
    }
}

list
     +-------------------------------+
     |      list_a     list_b   Cont |
     |-------------------------------|
  1. |    football    cricket      0 |
  2. |      tennis     hockey      0 |
  3. |      hockey   swimming      1 |
  4. |     cricket                 1 |
  5. |     croquet                 0 |
     |-------------------------------|
  6. | racquetball                 0 |
  7. |     cricket                 1 |
  8. |    pingpong                 0 |
  9. |      squash                 0 |
 10. |       rugby                 0 |
     |-------------------------------|
 11. |    swimming                 1 |
 12. |    swimming                 1 |
 13. |      soccer                 0 |
     +-------------------------------+

编辑:

如果您想另外输入某些拼写错误,可以将我的解决方案与@NickCox结合使用。在上述循环中,请改用:

replace Cont = 1 if inlist(list_a, "hoccckey") | list_a[`i'] == list_b[`j'] in `i'

答案 1 :(得分:1)

有一种简单的技术可以很好地适用于您的玩具示例:

clear 
input strL list_a 
football 
tennis 
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer 
end 

gen wanted = inlist(list_a, "cricket", "hockey", "swimming") 

list, sepby(wanted)

     +----------------------+
     |      list_a   wanted |
     |----------------------|
  1. |    football        0 |
  2. |      tennis        0 |
     |----------------------|
  3. |      hockey        1 |
  4. |     cricket        1 |
     |----------------------|
  5. |     croquet        0 |
  6. | racquetball        0 |
     |----------------------|
  7. |     cricket        1 |
     |----------------------|
  8. |    pingpong        0 |
  9. |      squash        0 |
 10. |       rugby        0 |
     |----------------------|
 11. |    swimming        1 |
 12. |    swimming        1 |
     |----------------------|
 13. |      soccer        0 |
     +----------------------+

如果您有更多的值,则可以使用levelsof(如果它们在第二个变量中)来遍历所寻求的不同值,或者将候选项放在单独的数据集中,并按照{ this FAQ

所有这些技术都取决于字符串的完全相等,因此请注意大写和小写,前导和尾随空格之间的差异以及拼写不一致。