R中的条件尾随空格删除

时间:2016-12-23 20:31:49

标签: r gsub

我正在尝试创建一个名为" combo"的变量。我想要全部小写的县,包括两个单词之间有一个空格,县名和州名缩写之间没有空格。

到目前为止,我有这个:

county <- c("Abbeville County", "Aleutians West Census Area",
           "Cerro Gordo County", "Lonoke County")
state <- c("West Virginia", "Wisconsin", "Wyoming", "Alabama")

trialdat <- data.frame(county, state)
trialdat$state <- sapply(trialdat$state, tolower)
# deal with trailing spaces 
trim.trailing <- function (x) sub("\\s+$", "", x)
trialdat$state2 <- as.factor(trim.trailing(as.factor(trialdat$state)))
trialdat$StateAbbrev <- stateFromLower(trialdat$state2)
trialdat$county2 <-     as.factor(trim.trailing(as.factor(trialdat$county)))
# make combo variable
trialdat = mutate(trialdat, combo=paste(tolower(gsub("County", "",county2)),
            StateAbbrev, sep=""))

所需的输出是

                       combo
1                  abbevilleWV
2 aleutians west census areaWI
3                cerro gordoWY
4                     lonokeAL

奇怪的事情正在发生。随着县名称的空间,我得到我想要的。但与其他县一样,在县名之后还留有一个空间。我不能简单地说明所有的空间,因为我需要它们在县名之间。有任何想法吗?谢谢!

注意:statefromLower函数如下所示,稍微从Chris' code调整。我把它包括在内,因为问题可能源于这一部分,不确定。

 stateFromLower <- function(x) {
  # read 52 state codes into local variable [includes DC
  # (Washington D.C. and PR (Puerto Rico)]
  st.codes <- data.frame(state1 = as.factor(c("AK", "AL", "AR", 
    "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", 
    "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", 
    "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", 
    "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", 
    "PR", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", 
    "WA", "WI", "WV", "WY")), full = as.factor(c("alaska", 
    "alabama", "arkansas", "arizona", "california", "colorado", 
    "connecticut", "district of columbia", "delaware", "florida", 
    "georgia", "hawaii", "iowa", "idaho", "illinois", "indiana", 
    "kansas", "kentucky", "louisiana", "massachusetts", "maryland", 
    "maine", "michigan", "minnesota", "missouri", "mississippi", 
    "montana", "north carolina", "north dakota", "nebraska", 
    "new hampshire", "new jersey", "new mexico", "nevada", 
    "new york", "ohio", "oklahoma", "oregon", "pennsylvania", 
    "puerto rico", "rhode island", "south carolina", "south dakota", 
    "tennessee", "texas", "utah", "virginia", "vermont", 
    "washington", "wisconsin", "west virginia", "wyoming")))

  # create an nx1 data.frame of state codes from source column
  st.x <- data.frame(full = x)
  # match source codes with codes from 'st.codes' local
  # variable and use to return the full state name
  refac.x <- st.codes$state1[match(st.x$full, st.codes$full)]
  # return the full state names in the same order in which they
  # appeared in the original source
  return(refac.x)
}

感谢您对格式问题的耐心,这是我的第一个问题!

1 个答案:

答案 0 :(得分:1)

固定!在mutate命令中,我不得不在County之前添加一个空格。

trialdat = mutate(trialdat, combo=paste(tolower(gsub(" County", "",     county2)), StateAbbrev, sep=""))