我正在使用Stata中的边缘列表,类型为:
var1 var2
a 1
a 2
a 3
b 1
b 2
1 a
2 b
我想删除非唯一对,例如1a和2b(对我来说与a1和b2相同)。我怎么能这样做?
答案 0 :(得分:0)
. clear
. input str1 (var1 var2)
var1 var2
1. a 1
2. a 2
3. a 3
4. b 1
5. b 2
6. 1 a
7. 2 b
8. end
. gen first = cond(var1 <= var2, var1, var2)
. gen second = cond(var1 <= var2, var2, var1)
. list
+------------------------------+
| var1 var2 first second |
|------------------------------|
1. | a 1 1 a |
2. | a 2 2 a |
3. | a 3 3 a |
4. | b 1 1 b |
5. | b 2 2 b |
|------------------------------|
6. | 1 a 1 a |
7. | 2 b 2 b |
+------------------------------+
. duplicates list first second
Duplicates in terms of first second
+--------------------------------+
| group: obs: first second |
|--------------------------------|
| 1 1 1 a |
| 1 6 1 a |
| 2 5 2 b |
| 2 7 2 b |
+--------------------------------+
. duplicates drop first second, force
Duplicates in terms of first second
(2 observations deleted)
. list
+------------------------------+
| var1 var2 first second |
|------------------------------|
1. | a 1 1 a |
2. | a 2 2 a |
3. | a 3 3 a |
4. | b 1 1 b |
5. | b 2 2 b |
+------------------------------+
答案的简单部分是使用duplicates drop
。但是如何获取数据以便1 a
和a 1
被视为重复?这些都记录在案here。我们可以对每个观察中的值进行排序,以便(在这种情况下)两者都排序为1 a
。链接的论文说的更多,但这是主要的想法,cond()
有所帮助。