如何获取在某个符号之前结束的子字符串

时间:2014-07-30 06:11:42

标签: substring stata

我在文件中的变量之一具有以下格式:

Bachelor of Commerce - AD - Accounting-Maj  
Bachelor of Commerce - Finance-Maj  
Bachelor of Commerce - Finance-Maj/Accounting-Min  
BSc with Specialization - Math & Finance-Maj  
BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj  
Bachelor of Commerce - Management Info Systems-Maj  

我想做的是在-符号之前取字符串的第一部分。

例如,从前三行我需要获得Bachelor of Commerce

如果有人能告诉我最简单的方法,我将不胜感激。

5 个答案:

答案 0 :(得分:3)

尝试此操作,假设您的变量名为string_var

split string_var, parse(" -") limit(1) gen(substring_before_first_hyphen)

答案 1 :(得分:2)

对于将来的问题,请发布尝试的代码以及为什么它不适合您。一些用户认为仅询问代码的问题是偏离主题的。

这是一种方式:

clear all
set more off

*----- example data -----

set obs 2

gen degree = "Bachelor of Commerce - AD - Accounting-Maj"
replace degree = "Bachelor of Something" in 2

list

*----- what you want -----

gen degree2 = trim(substr(degree, 1, strpos(degree, "-") - 1))
replace degree2 = degree if missing(degree2)

list

这将从位置1开始采用变量degree的子字符串,并在找到第一个-的位置(减1)结束。 trim()将修剪任何前导或尾随空白。如果原始变量中没有-,则会生成缺失,因此replace已就位。

有关可用于操作字符串的函数数组,请参阅help string functions

答案 2 :(得分:2)

使用substringsplit的先前答案在Stata中可能更好。我发布正则表达式解决方案只是为了完整性

clear
input strL degree
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end

gen str=regexs(0) if regexm(degree,"^[^\-]*")==1
list str

答案 3 :(得分:1)

还可以使用egen命令及其ends()函数和关联的punct选项:

clear

input strL string
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end

egen new_string = ends(string), punct(-)
list new_string

     +-------------------------------+
     |                    new_string |
     |-------------------------------|
  1. |         Bachelor of Commerce  |
  2. |         Bachelor of Commerce  |
  3. |         Bachelor of Commerce  |
  4. |      BSc with Specialization  |
  5. | BSc in Agric/Food Bus Mngmnt  |
     |-------------------------------|
  6. |         Bachelor of Commerce  |
     +-------------------------------+

答案 4 :(得分:0)

String course = Bachelor of Commerce - AD - Accounting-Maj;

如果你想获得之前的' - '字符串使用

String requiredSubString = course.split("-")[0];

在上面的代码拆分方法中返回stings数组,由' - '分隔。然后,您可以通过索引获取所需的子字符串。所以这里我们得到0个索引字符串,由 - 字符分隔。    即商业学士