我有一个数据集,必须删除重复的组合。
这些组合是成对的地方,每两列一个:
ID Place1 Place2
1 Ann Arbor Toledo
2 LA San Francisco
3 Chicago Peoria
4 Pittsburgh Cleveland
5 Richmond New Port
6 Ann Arbor Cincinnati
7 LA San Francisco
8 LA San Jose
9 Springfield Chicago
10 Richmond New Port
11 Atlanta Greenville
如何获得下面的输出?
ID Place1 Place2
1 Ann Arbor Toledo
2 LA San Francisco
3 Chicago Peoria
4 Pittsburgh Cleveland
5 Richmond New Port
6 Ann Arbor Cincinnati
7 LA San Jose
8 Springfield Chicago
9 Atlanta Greenville
答案 0 :(得分:1)
以下对我有用:
clear
input ID str20 Place1 str20 Place2
1 "Ann Arbor" "Toledo"
2 "LA" "San Francisco"
3 "Chicago" "Peoria"
4 "Pittsburgh" "Cleveland"
5 "Richmond" "New Port"
6 "Ann Arbor" "Cincinnati"
7 "LA" "San Francisco"
8 "LA" "San Jose"
9 "Springfield" "Chicago"
10 "Richmond" "New Port"
11 "Atlanta" "Greenville"
end
duplicates drop Place1 Place2, force
list, separator(0)
+----------------------------------+
| ID Place1 Place2 |
|----------------------------------|
1. | 1 Ann Arbor Toledo |
2. | 2 LA San Francisco |
3. | 3 Chicago Peoria |
4. | 4 Pittsburgh Cleveland |
5. | 5 Richmond New Port |
6. | 6 Ann Arbor Cincinnati |
7. | 8 LA San Jose |
8. | 9 Springfield Chicago |
9. | 11 Atlanta Greenville |
+----------------------------------+
在Stata的命令提示符中键入help duplicates
,以获取详细信息和完整语法。
重要的是要注意,如果您的数据中有成对的数据(例如以下数据对),则此方法将无效:
LA San Francisco
San Francisco LA
有关如何处理这种情况,请参见@NickCox的this文章。