我意识到有一个Stata论坛有这个确切的标题,但我没有发现它的语法都有用,特别是因为我的数据集有点不同。我有两个数据集。一个是人在设施中的停留时间,包括设施名称。它看起来像这样:
<!-- - - - - - - - - - - - - Insert new image - - - - - - - - - - - - - -->
<?php
if(isset($_POST['action']) && $_POST['action'] === 'add_new_gallery_image'){
// Store posted data into new variables
$image = $_FILES['image']['name'];
$image_tmp = $_FILES['image']['tmp_name'];
$title = sanitize($_POST['title']);
$alt = sanitize($_POST['alt']);
// Connect to database
include 'includes/dbconnect.php';
// Check if fields have input
if(!isset($title) || $title === '' || !isset($alt) || $alt === ''){
$errMsg = '*Please fill in all fields before submitting.';
include 'includes/error.html.php';
exit();
}
else {
// Move uploaded file to assigned folder (here "uploaded_gallery_images") http://php.net/manual/en/function.move-uploaded-file.php
move_uploaded_file($image_tmp, "../uploaded_gallery_images/$image");
// Remove file extension from image file
$image = pathinfo($image, PATHINFO_FILENAME);
// Set up name for thumbnail
$thumbnail = $image . '_thumb.jpg';
// Call function to create thumbnail (parameter 1 is path to newly uploaded image, parameter 2 is extension, parameter 3 is dimension (square). Resource: http://www.thewebhelp.com/php/functions/create-square-thumbs/
create_square_image("../uploaded_gallery_images/$image.jpg", $thumbnail, 100);
/*
// Test to demonstrate images retrieved
echo "<img src=../uploaded_gallery_images/$image.jpg>"; // path reflects image already moved to folder
echo "<img src=$thumbnail>"; // cropped image not moved to folder yet, so it displays with this path
exit();
*/
// Move uploaded file to assigned folder (here "uploaded_gallery_images") http://php.net/manual/en/function.move-uploaded-file.php
move_uploaded_file($thumbnail, "../uploaded_gallery_images/$thumbnail");
include 'includes/dbconnect.php';
$table = 'gallery';
// Re-attach extension before insertion into database table
$image = $image . '.jpg';
try {
$sql = "INSERT INTO $table SET
image = :image,
thumbnail = :thumbnail,
title = :title,
alt = :alt";
$s = $db->prepare($sql);
$s->bindValue(':image', $image);
$s->bindValue(':thumbnail', $thumbnail);
$s->bindValue(':title', $title);
$s->bindValue(':alt', $alt);
if( $s->execute() ){
echo "<script>alert('Image added!')</script>";
echo "<script>window.location.href = 'index.php'</script>";
}
}
catch (PDOException $e) {
$errMsg = 'Error inserting data into database: ' . $e->getMessage();
include 'includes/error.html.php';
exit();
}
// Close database connection
$db = null;
//header('Location: .');
exit();
}
}
下一个数据集显示访问日期。其中包含ID和访问日期:
+---+-------------+---------------+-----------------------+
|ID#|Entrance Date| Exit Date | Facility Name |
|1 | 7/22/2009 | 2/24/2010 | Facility 1 |
|1 | 7/10/2010 | 11/21/2010 | Facility 2 |
|2 | 3/31/2010 | 9/23/2010 | Facility 1 |
|3 | 11/24/2010 | 7/5/2011 | Facility 3 |
|4 | 3/7/2007 | 4/19/2010 | Facility 2 |
+---+-------------+---------------+-----------------------+
我想将这两个文件合并在+---+-------------+
|ID#|Visit Date |
| 1 | 08/21/2009 |
| 1 | 09/02/2009 |
| 1 | 09/23/2009 |
| 3 | 04/22/2011 |
| 3 | 05/05/2011 |
+---+-------------+
上,其中ID#
介于VisitDate
和Entrance Date
之间,以便我可以看到1.谁有访问者,2。他们在哪些设施。
答案 0 :(得分:4)
在SSC上有一个名为rangejoin
的新用户编写程序,它是针对此类问题量身定制的。要安装它,请输入Stata的命令窗口:
ssc install rangejoin
rangejoin
将根据日期进出(所需间隔的界限)和访问日期对每次停留进行配对。所有日期都必须是数字,因此我在下面的示例中将所有日期预转换为Stata日期。
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 visit int nvisit
1 "08/21/2009" 18130
1 "09/02/2009" 18142
1 "09/23/2009" 18163
3 "04/22/2011" 18739
3 "05/05/2011" 18752
end
format %td nvisit
save "visits.dta", replace
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(Entrance Exit Name) int(datein dateout)
1 "7/22/2009" "2/24/2010" "Facility 1" 18100 18317
1 "7/10/2010" "11/21/2010" "Facility 2" 18453 18587
2 "3/31/2010" "9/23/2010" "Facility 1" 18352 18528
3 "11/24/2010" "7/5/2011" "Facility 3" 18590 18813
4 "3/7/2007" "4/19/2010" "Facility 2" 17232 18371
end
format %td datein
format %td dateout
rangejoin nvisit datein dateout using "visits.dta", by(id)
bysort id datein: egen visit_count = total(!mi(nvisit))
list, sepby(id)
+-------------------------------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit nvisit visit_~t |
|-------------------------------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 08/21/2009 21aug2009 3 |
2. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/02/2009 02sep2009 3 |
3. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 09/23/2009 23sep2009 3 |
4. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
5. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 . 0 |
|-------------------------------------------------------------------------------------------------------|
6. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 04/22/2011 22apr2011 2 |
7. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 05/05/2011 05may2011 2 |
|-------------------------------------------------------------------------------------------------------|
8. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 . 0 |
+-------------------------------------------------------------------------------------------------------+
如果需要,您可以使用以下方法恢复原始观察结果:
by id datein: keep if _n == 1
keep id Entrance Exit Name datein dateout visit_count
list
+------------------------------------------------------------------------------+
| id Entrance Exit Name datein dateout visit_~t |
|------------------------------------------------------------------------------|
1. | 1 7/22/2009 2/24/2010 Facility 1 22jul2009 24feb2010 3 |
2. | 1 7/10/2010 11/21/2010 Facility 2 10jul2010 21nov2010 0 |
3. | 2 3/31/2010 9/23/2010 Facility 1 31mar2010 23sep2010 0 |
4. | 3 11/24/2010 7/5/2011 Facility 3 24nov2010 05jul2011 2 |
5. | 4 3/7/2007 4/19/2010 Facility 2 07mar2007 19apr2010 0 |
+------------------------------------------------------------------------------+
答案 1 :(得分:1)
任何类型的merge
似乎都没有帮助,因为您只能匹配标识符。我会用append
。
clear
input ID str10 (Entrance Exit) Name
1 "7/22/2009" "2/24/2010" 1
1 "7/10/2010" "11/21/2010" 2
2 "3/31/2010" "9/23/2010" 1
3 "11/24/2010" "7/5/2011" 3
4 "3/7/2007" "4/19/2010" 2
end
gen DateEntrance = daily(Entrance, "MDY")
gen DateExit = daily(Exit, "MDY")
drop Entrance Exit
sort ID, stable
by ID : gen T = _n
reshape long Date, i(ID T) j(Event) string
drop T
save Master, replace
clear
input ID str10 Visit
1 "08/21/2009"
1 "09/02/2009"
1 "09/23/2009"
3 "04/22/2011"
3 "05/05/2011"
end
gen Date = daily(Visit, "MDY")
drop Visit
gen Event = "Visit"
append using Master
sort ID Date
format Date %td
list, sepby(ID)
+----------------------------------+
| ID Date Event Name |
|----------------------------------|
1. | 1 22jul2009 Entrance 1 |
2. | 1 21aug2009 Visit . |
3. | 1 02sep2009 Visit . |
4. | 1 23sep2009 Visit . |
5. | 1 24feb2010 Exit 1 |
6. | 1 10jul2010 Entrance 2 |
7. | 1 21nov2010 Exit 2 |
|----------------------------------|
8. | 2 31mar2010 Entrance 1 |
9. | 2 23sep2010 Exit 1 |
|----------------------------------|
10. | 3 24nov2010 Entrance 3 |
11. | 3 22apr2011 Visit . |
12. | 3 05may2011 Visit . |
13. | 3 05jul2011 Exit 3 |
|----------------------------------|
14. | 4 07mar2007 Entrance 2 |
15. | 4 19apr2010 Exit 2 |
+----------------------------------+
答案 2 :(得分:1)
另一种方法使用joinby
:
/* Set up Visits Data */
clear
input ID str10 Visit
1 "08/21/2009"
1 "09/02/2009"
1 "09/23/2009"
3 "04/22/2011"
3 "05/05/2011"
end
gen DateVisit = daily(Visit, "MDY")
drop Visit
tempfile Visits
save `Visits'
/* Set up Facilities Data */
clear
input ID str10 (Entrance Exit Name)
1 "7/22/2009" "2/24/2010" "Facility 1"
1 "7/10/2010" "11/21/2010" "Facility 2"
2 "3/31/2010" "9/23/2010" "Facility 1"
3 "11/24/2010" "7/5/2011" "Facility 3"
4 "3/7/2007" "4/19/2010" "Facility 2"
end
gen DateEntrance = daily(Entrance, "MDY")
gen DateExit = daily(Exit, "MDY")
drop Entrance Exit
/* Create pairwise combinations within ID using -joinby- */
joinby ID using `Visits', unmatched(both)
drop _merge
format Date* %td
/* Whatever else you want now... */
gen Visitor = 0
replace Visitor = 1 if DateEntrance <= DateVisit & DateVisit <= DateExit
* or...
collapse (sum) countVisits = Visitor, by(ID Name DateEntrance DateExit)
* or...
replace DateVisit = . if !Visitor
by ID Name (DateVisit), sort : gen VisitNumber = _n * Visitor
collapse (sum) Visitor, by(ID Name DateEntrance DateExit DateVisit VisitNumber)
drop VisitNumber
list, sepby(ID)
+---------------------------------------------------------------+
| ID Name DateEnt~e DateExit DateVisit Visitor |
|---------------------------------------------------------------|
1. | 1 Facility 1 22jul2009 24feb2010 21aug2009 1 |
2. | 1 Facility 1 22jul2009 24feb2010 02sep2009 1 |
3. | 1 Facility 1 22jul2009 24feb2010 23sep2009 1 |
4. | 1 Facility 2 10jul2010 21nov2010 . 0 |
|---------------------------------------------------------------|
5. | 2 Facility 1 31mar2010 23sep2010 . 0 |
|---------------------------------------------------------------|
6. | 3 Facility 3 24nov2010 05jul2011 22apr2011 1 |
7. | 3 Facility 3 24nov2010 05jul2011 05may2011 1 |
|---------------------------------------------------------------|
8. | 4 Facility 2 07mar2007 19apr2010 . 0 |
+---------------------------------------------------------------+