我有一个庞大的5,000
观测数据集,我的数据子集如下:
AandB
1 222 454 213.51 59.15%
444 630 789.46 6.15%
2 374 798 807.69 32.00%
304 738 263.59 19.95%
177 641 617.86 18.07%
857 937 842.27 51.97%
973 127.33 0.03%
86 205 146.62 1.18%
我需要一个变量中的两个变量A
和B
。
例如,1 222 454 213.51
应该在A
列中作为1222454213.51
,变量B
中的对应观察值应该是59.15%
原始数据中有一个双倍空格,用于分隔A
中我想要的值和B
中我想要的值。
因此,我需要:
A B
1222454213.51 59.15%
444630789.46 6.15%
2374798807.69 32.00%
304738263.59 19.95%
177641617.86 18.07%
857937842.27 51.97%
973127.33 0.03%
86205146.62 1.18%
我能够通过以下方式获取变量A
:
generate A = reverse(substr(reverse(AandB),strpos(reverse(AandB), " "), . ))
replace A = subinstr(A, " ", "", .)
但是,我在提取百分比数字时遇到了麻烦。
答案 0 :(得分:1)
一种方法是:
split AandB, p(" ")
rename AandB1 A
rename AandB2 B
replace A = subinstr(A, " ", "", .)
list, separator(0)
+---------------------------------------------------+
| AandB A B |
|---------------------------------------------------|
1. | 1 222 454 213.51 59.15% 1222454213.51 59.15% |
2. | 444 630 789.46 6.15% 444630789.46 6.15% |
3. | 2 374 798 807.69 32.00% 2374798807.69 32.00% |
4. | 304 738 263.59 19.95% 304738263.59 19.95% |
5. | 177 641 617.86 18.07% 177641617.86 18.07% |
6. | 857 937 842.27 51.97% 857937842.27 51.97% |
7. | 973 127.33 0.03% 973127.33 0.03% |
8. | 86 205 146.62 1.18% 86205146.62 1.18% |
+---------------------------------------------------+
答案 1 :(得分:1)
另一种改进的方法是先剥掉最后一个“单词”(Stata意义):
for (ls in l_of_lists){
for (sublist in ls){
if (!ncol(sublist) == 3)
{
print(ncol(sublist))
#sublist <- NULL # this does achieve the desired result
}
}
}
如果您确实希望clear
input str42 AandB
"1 222 454 213.51 59.15%"
"444 630 789.46 6.15%"
"2 374 798 807.69 32.00%"
"304 738 263.59 19.95%"
"177 641 617.86 18.07%"
"857 937 842.27 51.97%"
"973 127.33 0.03%"
"86 205 146.62 1.18%"
end
generate B = word(AandB, -1)
generate A = trim(subinstr(AandB, B, "", .))
list AandB A B, separator(0)
+------------------------------------------------------+
| AandB A B |
|------------------------------------------------------|
1. | 1 222 454 213.51 59.15% 1 222 454 213.51 59.15% |
2. | 444 630 789.46 6.15% 444 630 789.46 6.15% |
3. | 2 374 798 807.69 32.00% 2 374 798 807.69 32.00% |
4. | 304 738 263.59 19.95% 304 738 263.59 19.95% |
5. | 177 641 617.86 18.07% 177 641 617.86 18.07% |
6. | 857 937 842.27 51.97% 857 937 842.27 51.97% |
7. | 973 127.33 0.03% 973 127.33 0.03% |
8. | 86 205 146.62 1.18% 86 205 146.62 1.18% |
+------------------------------------------------------+
被视为指定了一些非常大的数字,那么
A
是前进的一种方式。测量到12位有效数字意味着您处于天文学(也许前6位数字是好的)或经济学(并且也许前一位数字是可靠的)。
答案 2 :(得分:1)
以下对我有用:
clear
input str50 AandB
"1 222 454 213.51 59.15%"
"444 630 789.46 6.15%"
"2 374 798 807.69 32.00%"
"304 738 263.59 19.95%"
"177 641 617.86 18.07%"
"857 937 842.27 51.97%"
"973 127.33 0.03%"
"86 205 146.62 1.18%"
end
generate A = subinstr(substr(AandB, 1, strpos(AandB,"%")-6)," ", "", .)
generate B = subinstr(substr(AandB, strpos(AandB,"%")-6, .)," ", "", .)
list, separator(0)
+---------------------------------------------------+
| AandB A B |
|---------------------------------------------------|
1. | 1 222 454 213.51 59.15% 1222454213.51 59.15% |
2. | 444 630 789.46 6.15% 444630789.46 6.15% |
3. | 2 374 798 807.69 32.00% 2374798807.69 32.00% |
4. | 304 738 263.59 19.95% 304738263.59 19.95% |
5. | 177 641 617.86 18.07% 177641617.86 18.07% |
6. | 857 937 842.27 51.97% 857937842.27 51.97% |
7. | 973 127.33 0.03% 973127.33 0.03% |
8. | 86 205 146.62 1.18% 86205146.62 1.18% |
+---------------------------------------------------+
编辑:
再三考虑,可以简化为以下内容:
generate A = subinstr(substr(AandB, 1, strpos(AandB," "))," ", "", .)
generate B = subinstr(substr(AandB, strpos(AandB," "), .)," ", "", .)