我有两张桌子:
第一个表格包含姓名,日期,时间和日内价格变量。这意味着在特定日期和时间中每个名称的盘中价格。 第二个表有名称,日期和每日价格,每日价格是每个名称和日期的日内价格汇总。 我尝试编写一个执行以下过程的程序:
它可以在两个表中按名称和日期查找相同的观察结果,然后:
如果第一个和最后一个盘中价格超出了最后一天的0.962和1.0398倍的每日价格;然后删除与表1中该特定名称和日期相关的所有数据。
陈述是:
如果第一个也是最后一个(具体名称和日期的日内价格)不是[0.962 *(昨天的每日价格),1.0398 *(昨天的每日价格)]那么删除。
例如,考虑以下两个表:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 name long date str8 time double intraday_price
"A" 17659 "11:32:41" 3
"A" 17659 "12:32:41" 2
"A" 17659 "13:32:41" 1
"A" 17660 "11:32:41" 3.95
"A" 17660 "12:32:41" 3
"A" 17660 "13:32:41" 6
"A" 17660 "14:32:41" 4.01
"B" 17659 "11:32:41" 3.1
"B" 17659 "12:32:41" 1
"B" 17659 "13:32:41" 4
"B" 17659 "14:32:41" 2.9
"B" 17660 "11:32:41" 6
"B" 17660 "12:32:41" 1
"B" 17661 "11:32:41" 5
"B" 17661 "12:32:41" 7
"C" 17659 "11:32:41" 3
"C" 17659 "12:32:41" 2
"C" 17660 "11:32:41" 6.1
"C" 17660 "12:32:41" 3
"C" 17660 "13:32:41" 2
"C" 17661 "11:32:41" 8
"C" 17661 "12:32:41" 2
"C" 17661 "13:32:41" 3
"C" 17661 "14:32:41" 2
end
format %d date
表2是:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 name long date double daily_price
"A" 17657 3
"B" 17657 6
"C" 17657 5
"A" 17658 5
"A" 17659 4
"B" 17658 3
"B" 17659 4
"B" 17660 3
"C" 17658 7
"C" 17659 6
"C" 17660 5
end
format %d date
请考虑在公式中使用昨天的每日价格。
结果是:
+------+----------+----------+----------------+
| name | date | time | intraday price |
+------+----------+----------+----------------+
| B | 7-May-08 | 11:32:41 | 3.1 |
| B | 7-May-08 | 12:32:41 | 1 |
| B | 7-May-08 | 13:32:41 | 4 |
| B | 7-May-08 | 14:32:41 | 2.9 |
| A | 8-May-08 | 11:32:41 | 3.95 |
| A | 8-May-08 | 12:32:41 | 3 |
| A | 8-May-08 | 13:32:41 | 6 |
| A | 8-May-08 | 14:32:41 | 4.01 |
| C | 8-May-08 | 11:32:41 | 6.1 |
| C | 8-May-08 | 12:32:41 | 3 |
| C | 8-May-08 | 13:32:41 | 2 |
+------+----------+----------+----------------+
你能告诉我怎么做吗?
答案 0 :(得分:2)
您的问题不是很明确,我确定这是否是您想要的,而且您还有很多缺失的数据(表2中的名称日期与名称日期不匹配在表1)中,让我知道这是否达到你想要的效果。
基本上,我们将两个表都创建为临时文件。对于表2,我们首先在数据的最后一天之后创建一个值,因为我们想要一个"最后一天的价格"变量。然后我们创建"最后一天的价格"变量(我们可以在技术上使用时间序列运算符来执行此操作,但这有点简单)。然后我们将表2合并到表1上。我放弃任何没有日内价格的观察,因为我假设这些与你无关,然后使用bysort创建一个指标,表明你是否应该放弃。我注释掉了实际丢弃的部分,因此您可以首先关注数据,以确保达到您真正想要的效果。
首先,输入您的数据:
clear
tempfile table1 table2
// Input data
input str4 name long date str8 time double intraday_price
"A" 17659 "11:32:41" 3
"A" 17659 "12:32:41" 2
"A" 17659 "13:32:41" 1
"A" 17660 "11:32:41" 3.95
"A" 17660 "12:32:41" 3
"A" 17660 "13:32:41" 6
"A" 17660 "14:32:41" 4.01
"B" 17659 "11:32:41" 3.1
"B" 17659 "12:32:41" 1
"B" 17659 "13:32:41" 4
"B" 17659 "14:32:41" 2.9
"B" 17660 "11:32:41" 6
"B" 17660 "12:32:41" 1
"B" 17661 "11:32:41" 5
"B" 17661 "12:32:41" 7
"C" 17659 "11:32:41" 3
"C" 17659 "12:32:41" 2
"C" 17660 "11:32:41" 6.1
"C" 17660 "12:32:41" 3
"C" 17660 "13:32:41" 2
"C" 17661 "11:32:41" 8
"C" 17661 "12:32:41" 2
"C" 17661 "13:32:41" 3
"C" 17661 "14:32:41" 2
end
format %d date
save `table1'
clear
input str4 name long date double daily_price
"A" 17657 3
"B" 17657 6
"C" 17657 5
"A" 17658 5
"A" 17659 4
"B" 17658 3
"B" 17659 4
"B" 17660 3
"C" 17658 7
"C" 17659 6
"C" 17660 5
end
format %d date
现在,进行更改:
// Create a new observation to create a "lastday_price" for the day AFTER the last day in the data
levelsof name, local(names)
foreach name of local names {
set obs `=_N+1'
replace name = "`name'" if missing(name)
}
sort name date
// Generate lastday_price
bysort name (date): gen lastday_price = daily_price[_n-1]
bysort name (date): replace date = date[_n-1] + 1 if missing(date)
save `table2'
// Merge table 2 onto table 1 by name and date
use `table1', clear
merge m:1 name date using `table2'
drop if _merge == 2 // Only daily prices, no intra_day price
// Generate indicator for whether or not to drop
bysort name date (time): gen drop = 1 if ///
!inrange(intraday_price[1],0.962*lastday_price,1.0398*lastday_price) & ///
!inrange(intraday_price[_N],0.962*lastday_price,1.0398*lastday_price) & ///
!missing(lastday_price)
*drop if drop == 1