假设我有一个list_a
变量,其中包含世界上所有可能进行的运动:
football
tennis
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer
还要假设我还有另一个变量list_b
,它只有三个运动:
cricket
hockey
swimming
我想创建一个新变量Cont
,当在1
中找到list_a
中的运动项时,它等于list_b
,并且等于0
当这项运动不在list_b
中时。
变量Cont
如下所示:
0
0
1
1
0
0
1
0
0
0
1
1
0
将进行以下工作:
gen Cont = 0
replace Cont = 1 if (strmatch( list_a, ( list_b)))
编辑:
假设list_a
也包含hoccckey
(这是一个错字),但我仍然希望将其计算在内。
有办法吗?
答案 0 :(得分:2)
答案是否定的,因为您的方法将比较每个观察值中两个变量的值。相反,您需要将list_a
每行的值与变量list_b
的所有值进行比较。
使用玩具示例:
clear
input strL(list_a list_b)
football cricket
tennis hockey
hockey swimming
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer
end
以下说明了这种哲学:
local obs = _N
generate Cont = 0
forvalues i = 1 / `obs' {
forvalues j = 1 / `obs' {
replace Cont = 1 if list_a[`i'] == list_b[`j'] in `i'
}
}
list
+-------------------------------+
| list_a list_b Cont |
|-------------------------------|
1. | football cricket 0 |
2. | tennis hockey 0 |
3. | hockey swimming 1 |
4. | cricket 1 |
5. | croquet 0 |
|-------------------------------|
6. | racquetball 0 |
7. | cricket 1 |
8. | pingpong 0 |
9. | squash 0 |
10. | rugby 0 |
|-------------------------------|
11. | swimming 1 |
12. | swimming 1 |
13. | soccer 0 |
+-------------------------------+
编辑:
如果您想另外输入某些拼写错误,可以将我的解决方案与@NickCox结合使用。在上述循环中,请改用:
replace Cont = 1 if inlist(list_a, "hoccckey") | list_a[`i'] == list_b[`j'] in `i'
答案 1 :(得分:1)
有一种简单的技术可以很好地适用于您的玩具示例:
clear
input strL list_a
football
tennis
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer
end
gen wanted = inlist(list_a, "cricket", "hockey", "swimming")
list, sepby(wanted)
+----------------------+
| list_a wanted |
|----------------------|
1. | football 0 |
2. | tennis 0 |
|----------------------|
3. | hockey 1 |
4. | cricket 1 |
|----------------------|
5. | croquet 0 |
6. | racquetball 0 |
|----------------------|
7. | cricket 1 |
|----------------------|
8. | pingpong 0 |
9. | squash 0 |
10. | rugby 0 |
|----------------------|
11. | swimming 1 |
12. | swimming 1 |
|----------------------|
13. | soccer 0 |
+----------------------+
如果您有更多的值,则可以使用levelsof
(如果它们在第二个变量中)来遍历所寻求的不同值,或者将候选项放在单独的数据集中,并按照{ this FAQ。
所有这些技术都取决于字符串的完全相等,因此请注意大写和小写,前导和尾随空格之间的差异以及拼写不一致。