如何基于另一个表格中的日期观察来消除它后面的一天和一天?

时间:2018-04-09 18:25:50

标签: loops for-loop foreach stata

我有表1和表2,其中包含namedate个变量。

我想删除表1中的观察结果,并在表2中使用相同的namedate。此外,对于表1和表1之间的相同namedate 2,我想删除表1中的下一个日期。

表1:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 name long date
"A" 17659
"A" 17724
"A" 17900
"A" 17901
"A" 18086
"A" 18102
"A" 18239
"B" 17659
"B" 17662
"B" 17669
"B" 17676
"B" 17684
"B" 17701
"B" 18026
"C" 18177
"C" 18187
"C" 18195
"C" 18219
"C" 18235
"C" 18250
"C" 18391
"C" 18391
"C" 18392
end
format %d date

表2:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 name long date
"A" 17724
"A" 17900
"A" 18102
"B" 17659
"B" 17669
"B" 17701
"B" 18087
"C" 18187
"C" 18235
"C" 18250
end
format %d date

预期结果如下:

+------+-----------+
| name |   date    |
+------+-----------+
| A    | 7-May-08  |
| A    | 8-Jul-09  |
| B    | 1-Jun-08  |
| C    | 7-Oct-09  |
| C    | 18-Nov-09 |
| C    | 10-May-10 |
+------+-----------+

我该怎么做?

2 个答案:

答案 0 :(得分:0)

我不认为我得到这个,因为我无法重现你的结果。然而,这里的技术可能会有所帮助。

$arr = array("ZN1874" => "(12 >  5)", "ZN101"   => "(20 >  5)");
$arr1  = Array ("ZN1874" => "(12 >  3)", "ZN101"   => "(20 >  3)");
$newArr = array();
foreach($arr1  as $key=>$val){
    if(array_key_exists($key,$arr)){
        $newArr[$key] = $arr[$key]." and ".$val;
    }
    else {
       $newArr[$key] = $arr[$key]." and ".$val;
    }
}

print_r($newArr); //Output Array ( [ZN1874] => (12 > 5) and (12 > 3) [ZN101] => (20 > 5) and (20 > 3) ) 
clear

input str4 name long date
"A" 17659
"A" 17724
"A" 17900
"A" 17901
"A" 18086
"A" 18102
"A" 18239
"B" 17659
"B" 17662
"B" 17669
"B" 17676
"B" 17684
"B" 17701
"B" 18026
"C" 18177
"C" 18187
"C" 18195
"C" 18219
"C" 18235
"C" 18250
"C" 18391
"C" 18391
"C" 18392
end

format %d date
gen table = 1 
save table1 , replace
clear

input str4 name long date
"A" 17724
"A" 17900
"A" 18102
"B" 17659
"B" 17669
"B" 17701
"B" 18087
"C" 18187
"C" 18235
"C" 18250
end

format %d date
gen table = 2 
append using table1 

bysort name date (table) : gen todrop = table == 1 & table[1] != table[_N] 
bysort table name date : replace todrop = 1 if todrop[_n-1] == 1 
by table name date : replace todrop = 1 if todrop[_n-1] == 1 & date == date[_n-1] 

drop if todrop 

答案 1 :(得分:0)

只要没有重复的条目,下面的代码就会为您提供所需的输出:

clear

input str4 name1 long date1
"A" 17659
"A" 17724
"A" 17900
"A" 17901
"A" 18086
"A" 18102
"A" 18239
"B" 17659
"B" 17662
"B" 17669
"B" 17676
"B" 17684
"B" 17701
"B" 18026
"C" 18177
"C" 18187
"C" 18195
"C" 18219
"C" 18235
"C" 18250
"C" 18391
"C" 18391
"C" 18392
end
input str4 name2 long date2
"A" 17724
"A" 17900
"A" 18102
"B" 17659
"B" 17669
"B" 17701
"B" 18087
"C" 18187
"C" 18235
"C" 18250
end
format %d date1
format %d date2

local obs = _N

generate todrop1 = 0

forvalues i = 1 / `obs' {
    forvalues j = 1 / `obs' {
        replace todrop1 = 1 in `i' if name1[`i'] == name2[`j'] & ///
                                      date1[`i'] == date2[`j']
    }
}

generate todrop2 = 0

forvalues i = 1 / `obs' {
    if todrop1[`i'] == 1 {
        replace todrop2 = 1 in `=`i'+1'
    }   
}

list name1 date1 if todrop1 == 0 & todrop2 == 0

在这种特殊情况下,C 09may2010出现在输出中,因为它在name1中存在两次:

     +-------------------+
     | name1       date1 |
     |-------------------|
  1. |     A   07may2008 |
  5. |     A   08jul2009 |
 12. |     B   01jun2008 |
 15. |     C   07oct2009 |
 18. |     C   18nov2009 |
     |-------------------|
 22. |     C   09may2010 |
 23. |     C   10may2010 |
     +-------------------+

确实,从"C" 18391删除重复的条目name1并重新运行我们获得的代码:

    +-------------------+
    | name1       date1 |
    |-------------------|
 1. |     A   07may2008 |
 5. |     A   08jul2009 |
12. |     B   01jun2008 |
15. |     C   07oct2009 |
18. |     C   18nov2009 |
    |-------------------|
22. |     C   10may2010 |
    +-------------------+

如果您的数据中有重复的条目,您可以先使用duplicates命令删除它们,假设这是您在用例中做的