删除重复(非唯一)配对值

时间:2016-05-22 07:15:28

标签: stata

我正在使用Stata中的边缘列表,类型为:

var1 var2
 a    1
 a    2
 a    3
 b    1
 b    2
 1    a
 2    b

我想删除非唯一对,例如1a和2b(对我来说与a1和b2相同)。我怎么能这样做?

1 个答案:

答案 0 :(得分:0)

. clear 

. input str1 (var1 var2) 

          var1       var2
  1.  a    1
  2.  a    2
  3.  a    3
  4.  b    1
  5.  b    2
  6.  1    a
  7.  2    b
  8. end 

. gen first = cond(var1 <= var2, var1, var2) 

. gen second = cond(var1 <= var2, var2, var1) 

. list 

     +------------------------------+
     | var1   var2   first   second |
     |------------------------------|
  1. |    a      1       1        a |
  2. |    a      2       2        a |
  3. |    a      3       3        a |
  4. |    b      1       1        b |
  5. |    b      2       2        b |
     |------------------------------|
  6. |    1      a       1        a |
  7. |    2      b       2        b |
     +------------------------------+

. duplicates list first second 

Duplicates in terms of first second

  +--------------------------------+
  | group:   obs:   first   second |
  |--------------------------------|
  |      1      1       1        a |
  |      1      6       1        a |
  |      2      5       2        b |
  |      2      7       2        b |
  +--------------------------------+

. duplicates drop first second, force  

Duplicates in terms of first second

(2 observations deleted)

. list 

     +------------------------------+
     | var1   var2   first   second |
     |------------------------------|
  1. |    a      1       1        a |
  2. |    a      2       2        a |
  3. |    a      3       3        a |
  4. |    b      1       1        b |
  5. |    b      2       2        b |
     +------------------------------+

答案的简单部分是使用duplicates drop。但是如何获取数据以便1 aa 1被视为重复?这些都记录在案here。我们可以对每个观察中的值进行排序,以便(在这种情况下)两者都排序为1 a。链接的论文说的更多,但这是主要的想法,cond()有所帮助。