我已经在互联网上做了一些研究,到目前为止,我刚刚找到了用susbstr
删除观察中的第一个/最后一个字母的可能性,但没有找到像CTRL + F-findandreplace那样的命令。这里的技巧是我不一定知道我要放弃的词在哪里!
我的数据集如下所示:
Hosp_code Hosp Hosplat Hosplon
RRK RRK - UNIVERSITY HOSPITALS BIRMINGHAM NHS FOUNDATION TRUST 52.453271 -1.9362835
RLU RLU - BIRMINGHAM WOMEN'S NHS FOUNDATION TRUST 52.453184 -1.9422432
5MX 5MX - HEART OF BIRMINGHAM TEACHING PCT 52.471575 -1.9367724
NO0 NO0 - HEALTHHARMONIE LIMITED 52.470965 -1.9243192
NLU NLU - SK:N (LASERCARE CLINICS LTD) 52.470838 -1.9220819
NXX NXX - SCRIVENS LTD 52.47148 -1.91341
AGL AGL - ADDITIONAL COMMUNITY MEDICAL SERVICES LTD 52.477343 -1.917197
5M1 5M1 - SOUTH BIRMINGHAM PCT 52.445922 -1.8928915
NQR NQR - PRIMECARE PRIMARY CARE 52.484113 -1.9173169
RXT RXT - BIRMINGHAM AND SOLIHULL MENTAL HEALTH NHS FOUNDATION TRUST 52.484113 -1.9173169
RRJ RRJ - THE ROYAL ORTHOPAEDIC HOSPITAL NHS FOUNDATION TRUST 52.421133 -1.9608273
RXK RXK - SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS TRUST 52.48982 -1.9294268
RQ3 RQ3 - BIRMINGHAM CHILDREN'S HOSPITAL NHS FOUNDATION TRUST 52.485173 -1.8944604
RYW RYW - BIRMINGHAM COMMUNITY HEALTHCARE NHS TRUST 52.487323 -1.8858108
5PG 5PG - BIRMINGHAM EAST AND NORTH PCT 52.491369 -1.886036
NIT NIT - SOUTH DOC SERVICES LIMITED HQ 52.401796 -1.9620201
RR1 RR1 - HEART OF ENGLAND NHS FOUNDATION TRUST 52.477876 -1.8275305
NIS NIS - COVENTRY AND WARWICKSHIRE DIAGNOSTIC SERVICES LIMITED 52.462504 -1.8159336
NDT NDT - WEST MIDLANDS DIAGNOSTIC SERVICES LTD 52.462504 -1.8159336
5PF 5PF - SANDWELL PCT 52.523328 -2.0026388
TAJ TAJ - BLACK COUNTRY PARTNERSHIP NHS FOUNDATION TRUST 52.519255 -2.0188053
NEP NEP - TICCS ULTRASOUND LIMITED 52.510017 -1.8113152
NL7 NL7 - ASSURA VERTIS URGENT CARE CENTRES (BIRMINGHAM) 52.542091 -1.8778985
NNT NNT - ASSURA KINGSTANDING 52.542091 -1.8778985
5QW 5QW - SOLIHULL PCT 52.391695 -1.8081752
NR9 NR9 - JOHN TAYLOR HOSPICE COMMUNITY INTEREST COMPANY 52.527341 -1.8234016
RYK RYK - DUDLEY AND WALSALL MENTAL HEALTH PARTNERSHIP NHS TRUST 52.508312 -2.0844533
我想删除:
- 前三个字母(例如RRK -
)
- 提及"LTD" "LTD "LIMITED" "HQ" "LLP" "TRUST" "FOUNDATION TRUST"
有什么建议吗?
答案 0 :(得分:5)
这是一个使用Stata的类似策略。将来,请考虑周全并使用dataex
生成您的数据示例。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str68 Hosp
"RRK - UNIVERSITY HOSPITALS BIRMINGHAM NHS FOUNDATION TRUST"
"RLU - BIRMINGHAM WOMEN'S NHS FOUNDATION TRUST"
"5MX - HEART OF BIRMINGHAM TEACHING PCT"
"NO0 - HEALTHHARMONIE LIMITED"
"NLU - SK:N (LASERCARE CLINICS LTD)"
"NXX - SCRIVENS LTD"
"AGL - ADDITIONAL COMMUNITY MEDICAL SERVICES LTD"
"5M1 - SOUTH BIRMINGHAM PCT"
"NQR - PRIMECARE PRIMARY CARE"
"RXT - BIRMINGHAM AND SOLIHULL MENTAL HEALTH NHS FOUNDATION TRUST"
"RRJ - THE ROYAL ORTHOPAEDIC HOSPITAL NHS FOUNDATION TRUST"
"RXK - SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS TRUST"
"RQ3 - BIRMINGHAM CHILDREN'S HOSPITAL NHS FOUNDATION TRUST"
"RYW - BIRMINGHAM COMMUNITY HEALTHCARE NHS TRUST"
"5PG - BIRMINGHAM EAST AND NORTH PCT"
"NIT - SOUTH DOC SERVICES LIMITED HQ"
"RR1 - HEART OF ENGLAND NHS FOUNDATION TRUST"
"NIS - COVENTRY AND WARWICKSHIRE DIAGNOSTIC SERVICES LIMITED"
"NDT - WEST MIDLANDS DIAGNOSTIC SERVICES LTD"
"5PF - SANDWELL PCT"
"TAJ - BLACK COUNTRY PARTNERSHIP NHS FOUNDATION TRUST"
"NEP - TICCS ULTRASOUND LIMITED"
"NL7 - ASSURA VERTIS URGENT CARE CENTRES (BIRMINGHAM)"
"NNT - ASSURA KINGSTANDING"
"5QW - SOLIHULL PCT"
"NR9 - JOHN TAYLOR HOSPICE COMMUNITY INTEREST COMPANY"
"RYK - DUDLEY AND WALSALL MENTAL HEALTH PARTNERSHIP NHS TRUST"
end
gen work = substr(Hosp, 7, .)
replace work = ustrregexra(work, " *(LTD|LIMITED|HQ|LLP|TRUST|FOUNDATION TRUST)", "")
leftalign // from SSC, to install, type: ssc install leftalign
list
答案 1 :(得分:1)
以下是使用R的答案。两种策略,第一种使用基本R函数,第二种使用包stringr
。两者都使用3个步骤:i)删除前7个字符; ii)删除我们不想要的图案,iii)修剪任何剩余空间。
# a subset of your data.frame making your question reproducible
df <- structure(list(Hosp = c("NXX - SCRIVENS LTD", "AGL - ADDITIONAL COMMUNITY MEDICAL SERVICES LTD",
"5M1 - SOUTH BIRMINGHAM PCT", "NQR - PRIMECARE PRIMARY CARE",
"RXT - BIRMINGHAM AND SOLIHULL MENTAL HEALTH NHS FOUNDATION TRUST",
"RRJ - THE ROYAL ORTHOPAEDIC HOSPITAL NHS FOUNDATION TRUST",
"RXK - SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS TRUST")),
.Names = "Hosp", row.names = c(NA, -7L), class = "data.frame")
> df$Hosp
[1] "NXX - SCRIVENS LTD"
[2] "AGL - ADDITIONAL COMMUNITY MEDICAL SERVICES LTD"
[3] "5M1 - SOUTH BIRMINGHAM PCT"
[4] "NQR - PRIMECARE PRIMARY CARE"
[5] "RXT - BIRMINGHAM AND SOLIHULL MENTAL HEALTH NHS FOUNDATION TRUST"
[6] "RRJ - THE ROYAL ORTHOPAEDIC HOSPITAL NHS FOUNDATION TRUST"
[7] "RXK - SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS TRUST"
#base R functions -----------
gsub(" $", "", gsub("LTD|LIMITED|HQ|LLP|TRUST|FOUNDATION TRUST", "", substr(df$Hosp, 7, nchar(df$Hosp))))
# a function to do it
nice_hospname <- function(x){
gsub(" $", "", gsub("LTD|LIMITED|HQ|LLP|TRUST|FOUNDATION TRUST", "", substr(x, 7, nchar(x))))
}
# you can use it with:
nice_hospname(df$Hosp)
# with stringr package --------
library(stringr)
df$Hosp %>% str_sub(7) %>% str_replace("LTD|LIMITED|HQ|LLP|TRUST|FOUNDATION TRUST", "") %>% str_trim()
[1] "SCRIVENS" "ADDITIONAL COMMUNITY MEDICAL SERVICES"
[3] "SOUTH BIRMINGHAM PCT" "PRIMECARE PRIMARY CARE"
[5] "BIRMINGHAM AND SOLIHULL MENTAL HEALTH NHS" "THE ROYAL ORTHOPAEDIC HOSPITAL NHS"
[7] "SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS"
# a function to do it
nice_hospname2 <- function(x){
x %>% str_sub(7) %>% str_replace("LTD|LIMITED|HQ|LLP|TRUST|FOUNDATION TRUST", "") %>% str_trim()
}
# you can use it with:
nice_hospname2(df$Hosp)
library(stringr)
希望这有帮助。