我收到的数据字符串变量类似于:
var_name
25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30
01-APR-11: A25, B82, C65
04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82
12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54
27-OCT-15: A22, B95, C08
等等。我的目标是将这些字符串拆分为不同的变量名称。变量名称为v1_date
,v1_A
,v1_B
,v1_C
,v2_date
,v2_A
,v2_B
,{{1 },v2_C
,v3_date
,v3_A
,v3_B
。
我可以使用v3_C
,重命名为split var_name, p(";")
,v1
和v2
,再重新v3
来执行此操作。但问题是,我希望split
,v1
和v2
基于日期按时间顺序排列,并且数据当前不以这种方式排列。如何才能使v3
的日期早于v1
,v2
的日期早于v2
?例如,在第一次观察中,我希望v3
与25-DEC-99: A11, B14, C89
和v2
相关联,以与28-FEB-94: A27, B94, C30
相关联。
答案 0 :(得分:1)
一般情况下,请考虑使用dataex
(SSC)创建简单的数据示例。
您不会将用于split
变量的所有(非简单)代码都给出。碰巧的是,我不认为你的变量名很容易使用,所以我以自己的方式重新创建了分割。如果您reshape long
分割数据,那么按日期排序很容易,但我已经没有反向reshape wide
,因为我怀疑长结构更容易使用。
clear
input str80 data
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30"
"01-APR-11: A25, B82, C65"
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82"
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54"
"27-OCT-15: A22, B95, C08"
end
split data, p(;) gen(x)
local j = 1
gen work = ""
foreach x of var x* {
replace work = substr(`x', 1, strpos(`x', ":") - 1)
gen date`j' = daily(work, "DMY", 2050)
replace work = substr(`x', strpos(`x', ":") + 1, .)
split work, p(,)
rename (work1 work2 work3) (vA`j' vB`j' vC`j')
local ++j
}
drop work
drop x*
drop data
gen id = _n
edit
reshape long date vA vB vC, i(id) j(which)
drop if missing(date)
bysort id (date): replace which = _n
list, sepby(id)
+----------------------------------------+
| id which date vA vB vC |
|----------------------------------------|
1. | 1 1 12477 A27 B94 C30 |
2. | 1 2 14603 A11 B14 C89 |
|----------------------------------------|
3. | 2 1 18718 A25 B82 C65 |
|----------------------------------------|
4. | 3 1 15776 A11 B72 C68 |
5. | 3 2 18082 A21 B55 C26 |
6. | 3 3 18786 A62 B47 C82 |
|----------------------------------------|
7. | 4 1 14773 A77 B19 C73 |
8. | 4 2 19177 A99 B04 C54 |
|----------------------------------------|
9. | 5 1 20388 A22 B95 C08 |
+----------------------------------------+
答案 1 :(得分:1)
我相信以下内容会让你感动。它同时使用split
和reshape
。
clear
set more off
input ///
str100 myvar
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30"
"01-APR-11: A25, B82, C65"
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82"
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54"
"27-OCT-15: A22, B95, C08"
end
split myvar, p(;)
drop myvar
gen obs = _n
reshape long myvar, i(obs)
drop if missing(myvar)
split myvar, p(:)
drop myvar
gen myvar11 = date(myvar1, "DMY", 2020)
format %td myvar11
drop myvar1
rename (myvar11 myvar2) (mydate mycells)
order mydate, before(mycells)
bysort obs (mydate) : gen neworder = _n
drop _j
reshape wide mydate mycells, i(obs) j(neworder)
list
如果您需要进一步mycells
,可以循环显示split
个变量。