Stata根据一系列日期合并数据集

时间:2016-03-30 20:35:49

标签: merge stata

我意识到有一个Stata论坛有这个确切的标题,但我没有发现它的语法都有用,特别是因为我的数据集有点不同。我有两个数据集。一个是人在设施中的停留时间,包括设施名称。它看起来像这样:

<!-- - - - - - - - - - - - - Insert new image  - - - - - - - - - - - - - -->
<?php
if(isset($_POST['action']) && $_POST['action'] === 'add_new_gallery_image'){

    // Store posted data into new variables
    $image = $_FILES['image']['name'];
    $image_tmp = $_FILES['image']['tmp_name'];
    $title = sanitize($_POST['title']);
    $alt = sanitize($_POST['alt']);

    // Connect to database
    include 'includes/dbconnect.php';

    // Check if fields have input  
    if(!isset($title) || $title === '' || !isset($alt) || $alt === ''){

        $errMsg = '*Please fill in all fields before submitting.';
        include 'includes/error.html.php';
        exit();  
    }
    else {

        // Move uploaded file to assigned folder (here "uploaded_gallery_images") http://php.net/manual/en/function.move-uploaded-file.php
        move_uploaded_file($image_tmp, "../uploaded_gallery_images/$image");

        // Remove file extension from image file
        $image = pathinfo($image, PATHINFO_FILENAME);

        // Set up name for thumbnail
        $thumbnail = $image  . '_thumb.jpg';

        // Call function to create thumbnail (parameter 1 is path to newly uploaded image, parameter 2 is extension, parameter 3 is dimension (square). Resource: http://www.thewebhelp.com/php/functions/create-square-thumbs/
        create_square_image("../uploaded_gallery_images/$image.jpg", $thumbnail, 100);


    /*
        // Test to demonstrate images retrieved
        echo "<img src=../uploaded_gallery_images/$image.jpg>";  // path reflects image already moved to folder
        echo "<img src=$thumbnail>";  // cropped image not moved to folder yet, so it displays with this path
        exit();
    */

        // Move uploaded file to assigned folder (here "uploaded_gallery_images") http://php.net/manual/en/function.move-uploaded-file.php
        move_uploaded_file($thumbnail, "../uploaded_gallery_images/$thumbnail");

        include 'includes/dbconnect.php';
        $table = 'gallery';

        // Re-attach extension before insertion into database table
        $image = $image . '.jpg';

        try {
            $sql = "INSERT INTO $table SET
            image = :image,
            thumbnail = :thumbnail,
            title = :title,
            alt = :alt";

            $s = $db->prepare($sql);
            $s->bindValue(':image', $image);
            $s->bindValue(':thumbnail', $thumbnail);
            $s->bindValue(':title', $title);
            $s->bindValue(':alt', $alt);
            if( $s->execute() ){
                echo "<script>alert('Image added!')</script>";
                echo "<script>window.location.href = 'index.php'</script>";                              
            } 
        }
        catch (PDOException $e) {
            $errMsg = 'Error inserting data into database: ' . $e->getMessage();
            include 'includes/error.html.php';
            exit();
        }

        // Close database connection
        $db = null;

        //header('Location: .');
        exit();
    }                                      
} 

下一个数据集显示访问日期。其中包含ID和访问日期:

+---+-------------+---------------+-----------------------+
|ID#|Entrance Date|  Exit Date    | Facility Name         |
|1  | 7/22/2009   | 2/24/2010     | Facility 1            |
|1  | 7/10/2010   | 11/21/2010    | Facility 2            |
|2  | 3/31/2010   | 9/23/2010     | Facility 1            |
|3  | 11/24/2010  | 7/5/2011      | Facility 3            |
|4  | 3/7/2007    | 4/19/2010     | Facility 2            |
+---+-------------+---------------+-----------------------+

我想将这两个文件合并在+---+-------------+ |ID#|Visit Date | | 1 | 08/21/2009 | | 1 | 09/02/2009 | | 1 | 09/23/2009 | | 3 | 04/22/2011 | | 3 | 05/05/2011 | +---+-------------+ 上,其中ID#介于VisitDateEntrance Date之间,以便我可以看到1.谁有访问者,2。他们在哪些设施。

3 个答案:

答案 0 :(得分:4)

在SSC上有一个名为rangejoin的新用户编写程序,它是针对此类问题量身定制的。要安装它,请输入Stata的命令窗口:

ssc install rangejoin

rangejoin将根据日期进出(所需间隔的界限)和访问日期对每次停留进行配对。所有日期都必须是数字,因此我在下面的示例中将所有日期预转换为Stata日期。

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009"  "2/24/2010"  "Facility 1" 18100 18317
1 "7/10/2010"  "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010"  "9/23/2010"  "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011"   "Facility 3" 18590 18813
4 "3/7/2007"   "4/19/2010"  "Facility 2" 17232 18371
end
format %td datein
format %td dateout

rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)

     +-------------------------------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout        visit      nvisit   visit_~t |
     |-------------------------------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   08/21/2009   21aug2009          3 |
  2. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/02/2009   02sep2009          3 |
  3. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010   09/23/2009   23sep2009          3 |
  4. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  5. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010                        .          0 |
     |-------------------------------------------------------------------------------------------------------|
  6. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   04/22/2011   22apr2011          2 |
  7. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011   05/05/2011   05may2011          2 |
     |-------------------------------------------------------------------------------------------------------|
  8. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010                        .          0 |
     +-------------------------------------------------------------------------------------------------------+

如果需要,您可以使用以下方法恢复原始观察结果:

by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
     +------------------------------------------------------------------------------+
     | id     Entrance         Exit         Name      datein     dateout   visit_~t |
     |------------------------------------------------------------------------------|
  1. |  1    7/22/2009    2/24/2010   Facility 1   22jul2009   24feb2010          3 |
  2. |  1    7/10/2010   11/21/2010   Facility 2   10jul2010   21nov2010          0 |
  3. |  2    3/31/2010    9/23/2010   Facility 1   31mar2010   23sep2010          0 |
  4. |  3   11/24/2010     7/5/2011   Facility 3   24nov2010   05jul2011          2 |
  5. |  4     3/7/2007    4/19/2010   Facility 2   07mar2007   19apr2010          0 |
     +------------------------------------------------------------------------------+

答案 1 :(得分:1)

任何类型的merge似乎都没有帮助,因为您只能匹配标识符。我会用append

clear 
input ID str10 (Entrance Exit) Name  
1 "7/22/2009"  "2/24/2010"  1  
1 "7/10/2010"  "11/21/2010" 2  
2 "3/31/2010"  "9/23/2010"  1  
3 "11/24/2010" "7/5/2011"   3  
4 "3/7/2007"   "4/19/2010"  2 
end 
gen DateEntrance = daily(Entrance, "MDY") 
gen DateExit = daily(Exit, "MDY") 
drop Entrance Exit 
sort ID, stable 
by ID : gen T = _n 
reshape long Date, i(ID T) j(Event) string 
drop T 
save Master, replace 
clear 
input ID str10 Visit 
1 "08/21/2009"  
1 "09/02/2009"  
1 "09/23/2009" 
3 "04/22/2011"  
3 "05/05/2011"  
end 
gen Date = daily(Visit, "MDY") 
drop Visit 
gen Event = "Visit" 
append using Master 
sort ID Date 
format Date %td 
list, sepby(ID)  

    +----------------------------------+
     | ID        Date      Event   Name |
     |----------------------------------|
  1. |  1   22jul2009   Entrance      1 |
  2. |  1   21aug2009      Visit      . |
  3. |  1   02sep2009      Visit      . |
  4. |  1   23sep2009      Visit      . |
  5. |  1   24feb2010       Exit      1 |
  6. |  1   10jul2010   Entrance      2 |
  7. |  1   21nov2010       Exit      2 |
     |----------------------------------|
  8. |  2   31mar2010   Entrance      1 |
  9. |  2   23sep2010       Exit      1 |
     |----------------------------------|
 10. |  3   24nov2010   Entrance      3 |
 11. |  3   22apr2011      Visit      . |
 12. |  3   05may2011      Visit      . |
 13. |  3   05jul2011       Exit      3 |
     |----------------------------------|
 14. |  4   07mar2007   Entrance      2 |
 15. |  4   19apr2010       Exit      2 |
     +----------------------------------+

立即查看here for how to fill in the missings

答案 2 :(得分:1)

另一种方法使用joinby

/* Set up Visits Data */
clear 
input ID str10 Visit 
1 "08/21/2009"  
1 "09/02/2009"  
1 "09/23/2009" 
3 "04/22/2011"  
3 "05/05/2011"  
end 
gen DateVisit = daily(Visit, "MDY") 
drop Visit 
tempfile Visits
save `Visits'

/* Set up Facilities Data */
clear 
input ID str10 (Entrance Exit Name)  
1 "7/22/2009"  "2/24/2010"  "Facility 1"  
1 "7/10/2010"  "11/21/2010" "Facility 2"  
2 "3/31/2010"  "9/23/2010"  "Facility 1" 
3 "11/24/2010" "7/5/2011"   "Facility 3"
4 "3/7/2007"   "4/19/2010"  "Facility 2" 
end 
gen DateEntrance = daily(Entrance, "MDY") 
gen DateExit = daily(Exit, "MDY") 
drop Entrance Exit 

/* Create pairwise combinations within ID using -joinby- */
joinby ID using `Visits', unmatched(both)
drop _merge
format Date* %td

/* Whatever else you want now... */
gen Visitor = 0
replace Visitor = 1 if DateEntrance <= DateVisit & DateVisit <= DateExit

* or... 
collapse (sum) countVisits = Visitor, by(ID Name DateEntrance DateExit)

* or...
replace DateVisit = . if !Visitor
by ID Name (DateVisit), sort : gen VisitNumber = _n * Visitor
collapse (sum) Visitor, by(ID Name DateEntrance DateExit DateVisit VisitNumber)
drop VisitNumber
list, sepby(ID)

     +---------------------------------------------------------------+
     | ID         Name   DateEnt~e    DateExit   DateVisit   Visitor |
     |---------------------------------------------------------------|
  1. |  1   Facility 1   22jul2009   24feb2010   21aug2009         1 |
  2. |  1   Facility 1   22jul2009   24feb2010   02sep2009         1 |
  3. |  1   Facility 1   22jul2009   24feb2010   23sep2009         1 |
  4. |  1   Facility 2   10jul2010   21nov2010           .         0 |
     |---------------------------------------------------------------|
  5. |  2   Facility 1   31mar2010   23sep2010           .         0 |
     |---------------------------------------------------------------|
  6. |  3   Facility 3   24nov2010   05jul2011   22apr2011         1 |
  7. |  3   Facility 3   24nov2010   05jul2011   05may2011         1 |
     |---------------------------------------------------------------|
  8. |  4   Facility 2   07mar2007   19apr2010           .         0 |
     +---------------------------------------------------------------+