如何使用两个变量重塑形状

时间:2015-07-11 19:26:32

标签: stata reshape

假设我有这些数据:

group    obs    data    data_A    data_B
1        1      7_a     7_a       
1        2      4_b               4_b  
1        3      1_a     1_a     
2        1      5_b               5_b
3        1                  
4        1      3_b               3_b
4        2      4_b               4_b
4        3      9_a     9_a     
4        4      8_b               8_b   

data_Adata_B基于data构建。如果datadata结束adata_A结束b,则他们会遵循data_B的价值data_A;如果数据为空,则data_Bgroup data_A1 data_A2 data_B1 data_B2 data_B3 1 7_a 1_a 4_b 2 5_b 3 4 9_a 3_b 4_b 8_b 都保持空白。

我想重新整理数据如下:

7_a

其中列数由值的数量自动确定。

9_adata_A1位于a,因为它们是各自组中1_a变量的第一个实例。 data_A2位于a,因为它是其组中reshape变量的第二个实例,依此类推。

如何做到这一点?

(我知道while( $result = sqlsrv_fetch_object($sql_Gpo_Carr)) { $array_res[] = $result; // add result to array array_push($array_res, array('unidad' => $uni)); // add extra element $jsonObj = json_encode($array_res); // encode JSON } echo $jsonObj; exit(); 并且可以在类似的情况下使用它。)

2 个答案:

答案 0 :(得分:1)

一种方法是使用循环。不是很优雅,但它有效。

clear
set more off

*----- example data -----

input ///
group    obs    str3(data    data_A    data_B)
1        1      7_a     7_a           ""
1        2      4_b       ""        4_b  
1        3      1_a     1_a          ""
2        1      5_b      ""         5_b
3        1       ""        ""       ""
4        1      3_b       ""        3_b
4        2      4_b       ""        4_b
4        3      9_a     9_a          ""
4        4      8_b       ""        8_b   
end

drop data
list, sepby(group)

*----- what you want -----

quietly foreach i in A B {

    bysort group (obs) : gen count_`i' = sum(!missing(data_`i'))
    summarize count_`i', meanonly

    forvalues j = 1/`r(max)' {
        gen data_`i'`j' = ""
        replace data_`i'`j' = data_`i' if count_`i' == `j'
    }

    drop count_`i'
}

drop data_?

collapse (firstnm) data_*, by(group)

list

另一种方式使用reshapefillin

clear
set more off

*----- example data -----

input ///
group    obs    str3(data    data_A    data_B)
1        1      7_a     7_a           ""
1        2      4_b       ""        4_b  
1        3      1_a     1_a          ""
2        1      5_b      ""         5_b
3        1       ""        ""       ""
4        1      3_b       ""        3_b
4        2      4_b       ""        4_b
4        3      9_a     9_a          ""
4        4      8_b       ""        8_b   
end

drop data

list, sepby(group)

*----- what you want -----

// first reshape
reshape long data_ , i(group obs) j(j) string

// counts per group j
bysort group j (obs) : gen count = sum(!missing(data_))

// concatenate and rectangularize
gen j2 = j + string(count)
fillin group j2

// drop some observations
bysort group j2 (data_) : drop if _n < _N | inlist(j2, "A0", "B0")

// keep necessary variables
keep group j2 data_

// second reshape
reshape wide data_, i(group) j(j2) string

list

我发现循环解决方案更直观。

您的目标数据结构相当奇怪。插入一些背景以及最终目标总是一个好主意。

答案 1 :(得分:1)

我同意罗伯托的说法,这有点奇怪。这是另一种有趣的方式:

clear
input float(group obs) str3(data data_A data_B)
1 1 "7_a" "7_a" "" 
1 2 "4_b" "" "4_b" 
1 3 "1_a" "1_a" "" 
2 1 "5_b" "" "5_b" 
3 1 "" "" "" 
4 1 "3_b" "" "3_b" 
4 2 "4_b" "" "4_b" 
4 3 "9_a" "9_a" "" 
4 4 "8_b" "" "8_b" 
end

* verify assumptions about the data
isid group obs, sort

* concatenate values across obs
by group (obs): replace data_A = data_A[_n-1] + " " + data_A
by group (obs): replace data_B = data_B[_n-1] + " " + data_B

* the last obs of the group contains all values
by group: keep if _n == _N

* split each concatenated string
split data_A
split data_B

drop obs data data_A data_B
list