为我提供了一个文本文件,其中包含以下json类型的数据:
{"interviews":{
"4582058":{"date":24oct2015,"status":completed},
"2045873":{"date":12nov2015,"status":unclear},
"5969361":{"date":19dec2015,"status":pending},
"4969210":{"date":7jan2016,"status":completed}}}
我为要导入数据的每个人提供以下标识符:
0234, 6232, 6953, 9586, 4198
如何在Stata 14中做到这一点?
答案 0 :(得分:2)
首先,您需要在Stata中输入标识符并将其保存到dta
文件中:
clear
input id
0234
6232
6953
9586
4198
end
list
+------+
| id |
|------|
1. | 234 |
2. | 6232 |
3. | 6953 |
4. | 9586 |
5. | 4198 |
+------+
save identifiers, replace
假设json类型数据保存在名为interviews.txt
的文件中,则可以
使用import delimited
命令将它们导入到字符串变量中:
import delimited interviews.txt, rowrange(2) delimiter("},", asstring) clear
drop v2
list
+-------------------------------------------------+
| v1 |
|-------------------------------------------------|
1. | 4582058":{"date":24oct2015,"status":completed |
2. | 2045873":{"date":12nov2015,"status":unclear |
3. | 5969361":{"date":19dec2015,"status":pending |
4. | 4969210":{"date":7jan2016,"status":completed}}} |
+-------------------------------------------------+
然后,您可以结合使用string functions来提取各个变量中的信息:
generate interview = substr(v1, 1, strpos(v1, ":") - 2)
generate date = substr(v1, ustrrpos(v1, ":", 25) + 1, strpos(v1, ",") - ustrrpos(v1, ":", 25) - 1)
generate status = subinstr(substr(v1, strrpos(v1, ":") + 1, .), "}", "", .)
list, abbreviate(10)
+-------------------------------------------------------------------------------------+
| v1 interview date status |
|-------------------------------------------------------------------------------------|
1. | 4582058":{"date":24oct2015,"status":completed 4582058 24oct2015 completed |
2. | 2045873":{"date":12nov2015,"status":unclear 2045873 12nov2015 unclear |
3. | 5969361":{"date":19dec2015,"status":pending 5969361 19dec2015 pending |
4. | 4969210":{"date":7jan2016,"status":completed}}} 4969210 7jan2016 completed |
+-------------------------------------------------------------------------------------+
drop v1
这种方法很简单,并且允许id
,date
和status
可变内容的长度是可变的。
完成这些步骤后,您可以使用cross
命令来获得所需的
输出:
cross using identifiers
order id
sort id interview
list, abbreviate(10) sepby(id)
+------------------------------------------+
| id interview date status |
|------------------------------------------|
1. | 234 2045873 12nov2015 unclear |
2. | 234 4582058 24oct2015 completed |
3. | 234 4969210 7jan2016 completed |
4. | 234 5969361 19dec2015 pending |
|------------------------------------------|
5. | 4198 2045873 12nov2015 unclear |
6. | 4198 4582058 24oct2015 completed |
7. | 4198 4969210 7jan2016 completed |
8. | 4198 5969361 19dec2015 pending |
|------------------------------------------|
9. | 6232 2045873 12nov2015 unclear |
10. | 6232 4582058 24oct2015 completed |
11. | 6232 4969210 7jan2016 completed |
12. | 6232 5969361 19dec2015 pending |
|------------------------------------------|
13. | 6953 2045873 12nov2015 unclear |
14. | 6953 4582058 24oct2015 completed |
15. | 6953 4969210 7jan2016 completed |
16. | 6953 5969361 19dec2015 pending |
|------------------------------------------|
17. | 9586 2045873 12nov2015 unclear |
18. | 9586 4582058 24oct2015 completed |
19. | 9586 4969210 7jan2016 completed |
20. | 9586 5969361 19dec2015 pending |
+------------------------------------------+