假设我有这些数据:
group obs data data_A data_B
1 1 7_a 7_a
1 2 4_b 4_b
1 3 1_a 1_a
2 1 5_b 5_b
3 1
4 1 3_b 3_b
4 2 4_b 4_b
4 3 9_a 9_a
4 4 8_b 8_b
data_A
和data_B
基于data
构建。如果data
以data
结束a
而data_A
结束b
,则他们会遵循data_B
的价值data_A
;如果数据为空,则data_B
和group data_A1 data_A2 data_B1 data_B2 data_B3
1 7_a 1_a 4_b
2 5_b
3
4 9_a 3_b 4_b 8_b
都保持空白。
我想重新整理数据如下:
7_a
其中列数由值的数量自动确定。
9_a
和data_A1
位于a
,因为它们是各自组中1_a
变量的第一个实例。 data_A2
位于a
,因为它是其组中reshape
变量的第二个实例,依此类推。
如何做到这一点?
(我知道while( $result = sqlsrv_fetch_object($sql_Gpo_Carr)) {
$array_res[] = $result; // add result to array
array_push($array_res, array('unidad' => $uni)); // add extra element
$jsonObj = json_encode($array_res); // encode JSON
}
echo $jsonObj;
exit();
并且可以在类似的情况下使用它。)
答案 0 :(得分:1)
一种方法是使用循环。不是很优雅,但它有效。
clear
set more off
*----- example data -----
input ///
group obs str3(data data_A data_B)
1 1 7_a 7_a ""
1 2 4_b "" 4_b
1 3 1_a 1_a ""
2 1 5_b "" 5_b
3 1 "" "" ""
4 1 3_b "" 3_b
4 2 4_b "" 4_b
4 3 9_a 9_a ""
4 4 8_b "" 8_b
end
drop data
list, sepby(group)
*----- what you want -----
quietly foreach i in A B {
bysort group (obs) : gen count_`i' = sum(!missing(data_`i'))
summarize count_`i', meanonly
forvalues j = 1/`r(max)' {
gen data_`i'`j' = ""
replace data_`i'`j' = data_`i' if count_`i' == `j'
}
drop count_`i'
}
drop data_?
collapse (firstnm) data_*, by(group)
list
另一种方式使用reshape
和fillin
:
clear
set more off
*----- example data -----
input ///
group obs str3(data data_A data_B)
1 1 7_a 7_a ""
1 2 4_b "" 4_b
1 3 1_a 1_a ""
2 1 5_b "" 5_b
3 1 "" "" ""
4 1 3_b "" 3_b
4 2 4_b "" 4_b
4 3 9_a 9_a ""
4 4 8_b "" 8_b
end
drop data
list, sepby(group)
*----- what you want -----
// first reshape
reshape long data_ , i(group obs) j(j) string
// counts per group j
bysort group j (obs) : gen count = sum(!missing(data_))
// concatenate and rectangularize
gen j2 = j + string(count)
fillin group j2
// drop some observations
bysort group j2 (data_) : drop if _n < _N | inlist(j2, "A0", "B0")
// keep necessary variables
keep group j2 data_
// second reshape
reshape wide data_, i(group) j(j2) string
list
我发现循环解决方案更直观。
您的目标数据结构相当奇怪。插入一些背景以及最终目标总是一个好主意。
答案 1 :(得分:1)
我同意罗伯托的说法,这有点奇怪。这是另一种有趣的方式:
clear
input float(group obs) str3(data data_A data_B)
1 1 "7_a" "7_a" ""
1 2 "4_b" "" "4_b"
1 3 "1_a" "1_a" ""
2 1 "5_b" "" "5_b"
3 1 "" "" ""
4 1 "3_b" "" "3_b"
4 2 "4_b" "" "4_b"
4 3 "9_a" "9_a" ""
4 4 "8_b" "" "8_b"
end
* verify assumptions about the data
isid group obs, sort
* concatenate values across obs
by group (obs): replace data_A = data_A[_n-1] + " " + data_A
by group (obs): replace data_B = data_B[_n-1] + " " + data_B
* the last obs of the group contains all values
by group: keep if _n == _N
* split each concatenated string
split data_A
split data_B
drop obs data data_A data_B
list