从字符串中提取字母和数字

时间:2019-05-12 10:45:35

标签: stata

我有以下字符串:

KZ1,345,769.1
PKS948,123.9   
XG829,823.5 
324JKL,282.7
456MJB87,006.01

如何区分字母和数字?

这是我期望的结果:

  KZ   1345769.1  
 PKS    948123.9  
  XG    829823.5  
 JKL    324282.7  
 MJB    45687006  

为此,我尝试使用split命令,但没有成功。

2 个答案:

答案 0 :(得分:2)

您想要的内容可以通过一个简单的正则表达式来完成:

clear

input str15 foo
"KZ1,345,769.1"
"PKS948,123.9"   
"XG829,823.5" 
"324JKL,282.7"
"456MJB87,006.01"
end

generate foo1 = subinstr(ustrregexra(foo, "[\d\.]", ""), ",", "", .)
generate double foo2 = real(ustrregexra(foo, "[^\d\.]", ""))

list

     +------------------------------------+
     |             foo   foo1        foo2 |
     |------------------------------------|
  1. |   KZ1,345,769.1     KZ   1345769.1 |
  2. |    PKS948,123.9    PKS    948123.9 |
  3. |     XG829,823.5     XG    829823.5 |
  4. |    324JKL,282.7    JKL    324282.7 |
  5. | 456MJB87,006.01    MJB    45687006 |
     +------------------------------------+

在Stata的命令提示符下键入help subinstr()help ustrregexra()help real()将为您提供有关这些功能的用法和语法的更多详细信息。

答案 1 :(得分:1)

@Pearly Spencer的答案肯定是更可取的,但是任何程序员都应该发生以下朴素的循环。依次查看每个字符并确定是否为字母;或数字或小数点;或其他内容(隐式)并以此方式建立答案。请注意,尽管我们遍历字符串的长度,但遍历观察结果也是默认的。

clear 
input str42 whatever 
"KZ1,345,769.1"
"PKS948,123.9"   
"XG829,823.5" 
"324JKL,282.7"
"456MJB87,006.01"
end 

compress 

local length = substr("`: type whatever'", 4, .) 

gen letters = "" 
gen numbers = "" 

quietly forval j = 1/`length' { 
    local arg substr(whatever,`j', 1) 
    replace letters = letters + `arg' if inrange(`arg', "A", "Z") 
    replace numbers = numbers + `arg' if `arg' == "." | inrange(`arg', "0", "9") 
}

list 


     +-----------------------------------------+
     |        whatever   letters       numbers |
     |-----------------------------------------|
  1. |   KZ1,345,769.1        KZ     1345769.1 |
  2. |    PKS948,123.9       PKS      948123.9 |
  3. |     XG829,823.5        XG      829823.5 |
  4. |    324JKL,282.7       JKL      324282.7 |
  5. | 456MJB87,006.01       MJB   45687006.01 |
     +-----------------------------------------+