从文本文件导入json类型的数据

时间:2018-09-29 12:46:24

标签: stata

为我提供了一个文本文件,其中包含以下json类型的数据:

{"interviews":{
"4582058":{"date":24oct2015,"status":completed},
"2045873":{"date":12nov2015,"status":unclear},
"5969361":{"date":19dec2015,"status":pending},
"4969210":{"date":7jan2016,"status":completed}}}

我为要导入数据的每个人提供以下标识符:

0234, 6232, 6953, 9586, 4198

如何在Stata 14中做到这一点?

1 个答案:

答案 0 :(得分:2)

首先,您需要在Stata中输入标识符并将其保存到dta文件中:

clear

input id
0234
6232
6953
9586
4198
end

list

     +------+
     |   id |
     |------|
  1. |  234 |
  2. | 6232 |
  3. | 6953 |
  4. | 9586 |
  5. | 4198 |
     +------+

save identifiers, replace

假设json类型数据保存在名为interviews.txt的文件中,则可以 使用import delimited命令将它们导入到字符串变量中:

import delimited interviews.txt, rowrange(2) delimiter("},", asstring) clear
drop v2

list

     +-------------------------------------------------+
     |                                              v1 |
     |-------------------------------------------------|
  1. |   4582058":{"date":24oct2015,"status":completed |
  2. |     2045873":{"date":12nov2015,"status":unclear |
  3. |     5969361":{"date":19dec2015,"status":pending |
  4. | 4969210":{"date":7jan2016,"status":completed}}} |
     +-------------------------------------------------+

然后,您可以结合使用string functions来提取各个变量中的信息:

generate interview = substr(v1, 1, strpos(v1, ":") - 2)
generate date = substr(v1, ustrrpos(v1, ":", 25) + 1, strpos(v1, ",") - ustrrpos(v1, ":", 25) - 1)
generate status = subinstr(substr(v1, strrpos(v1, ":") + 1, .), "}", "", .)

list, abbreviate(10)

     +-------------------------------------------------------------------------------------+
     |                                              v1   interview        date      status |
     |-------------------------------------------------------------------------------------|
  1. |   4582058":{"date":24oct2015,"status":completed     4582058   24oct2015   completed |
  2. |     2045873":{"date":12nov2015,"status":unclear     2045873   12nov2015     unclear |
  3. |     5969361":{"date":19dec2015,"status":pending     5969361   19dec2015     pending |
  4. | 4969210":{"date":7jan2016,"status":completed}}}     4969210    7jan2016   completed |
     +-------------------------------------------------------------------------------------+

drop v1

这种方法很简单,并且允许iddatestatus可变内容的长度是可变的。

完成这些步骤后,您可以使用cross命令来获得所需的 输出:

cross using identifiers

order id
sort id interview

list, abbreviate(10) sepby(id)

     +------------------------------------------+
     |   id   interview        date      status |
     |------------------------------------------|
  1. |  234     2045873   12nov2015     unclear |
  2. |  234     4582058   24oct2015   completed |
  3. |  234     4969210    7jan2016   completed |
  4. |  234     5969361   19dec2015     pending |
     |------------------------------------------|
  5. | 4198     2045873   12nov2015     unclear |
  6. | 4198     4582058   24oct2015   completed |
  7. | 4198     4969210    7jan2016   completed |
  8. | 4198     5969361   19dec2015     pending |
     |------------------------------------------|
  9. | 6232     2045873   12nov2015     unclear |
 10. | 6232     4582058   24oct2015   completed |
 11. | 6232     4969210    7jan2016   completed |
 12. | 6232     5969361   19dec2015     pending |
     |------------------------------------------|
 13. | 6953     2045873   12nov2015     unclear |
 14. | 6953     4582058   24oct2015   completed |
 15. | 6953     4969210    7jan2016   completed |
 16. | 6953     5969361   19dec2015     pending |
     |------------------------------------------|
 17. | 9586     2045873   12nov2015     unclear |
 18. | 9586     4582058   24oct2015   completed |
 19. | 9586     4969210    7jan2016   completed |
 20. | 9586     5969361   19dec2015     pending |
     +------------------------------------------+