我有一个这样的专栏:
x <- c('WV West Virginia','FL Florida','CA California','SC South Carolina')
# [1] WV West Virginia FL Florida
# [3] CA California SC South Carolina
如何将缩写与整个州名分开。我想给两个新列两个不同的标题。我想我只能通过将所有大写字母分开来解决这个问题。
答案 0 :(得分:4)
使用tidyr
,我们可以使用separate
将列扩展为两个,同时指定新名称。参数extra=merge
将输出限制为给定列。分隔符将默认为非alpha-numerics:
library(tidyr)
separate(df, x, c("Abb", "State"), extra="merge")
# Abb State
#1 WV West Virginia
#2 FL Florida
#3 CA California
#4 SC South Carolina
数据强>
x = c('WV West Virginia', 'FL Florida','CA California', 'SC South Carolina')
答案 1 :(得分:3)
没有外部包的两种方法:
方法1:您可以将substring
与nchar
结合使用。
dat <-data.frame(raw=c("WV West Virginia","FL Florida", "CA California","SC South Carolina"),
stringsAsFactors=F)
dat$code <- substr(dat$raw,1,2)
dat$state <- substr(dat$raw, 4, nchar(dat$raw))
> dat
raw code state
1 WV West Virginia WV West Virginia
2 FL Florida FL Florida
3 CA California CA California
4 SC South Carolina SC South Carolina
方法二:您可以使用正则表达式替换部分字符串:
##approach two: regex
dat$code <- sub(" .+","",dat$raw)
dat$state <- sub("[A-Z]{2} ","",dat$raw)
答案 2 :(得分:3)
使用基础数据集包附带的state.*
常量
DF = data.frame(raw=c("WV West Virginia","FL Florida","CA California","SC South Carolina"))
DF$state.abbr <- substr(DF$raw, 1, 2)
DF$state.name <- state.name[ match(DF$state.abbr, state.abb) ]
# raw state.abbr state.name
# 1 WV West Virginia WV West Virginia
# 2 FL Florida FL Florida
# 3 CA California CA California
# 4 SC South Carolina SC South Carolina
这样,您可以在州名中输入错别字或其他奇怪的内容。
答案 3 :(得分:2)
使用reshape2包。
library(reshape2)
x <- rbind('WV West Virginia','FL Florida','CA California','SC South Carolina')
colsplit(x," ",c("Code","State"))
输出:
Code State
1 WV West Virginia
2 FL Florida
3 CA California
4 SC South Carolina
答案 4 :(得分:2)
根据@ rawr的评论,我们可以split
&#39; x&#39;在前两个字符后面的空格处,即由正则表达式的外观((?<=^.{2})
)显示。输出结果为list
,我们rbind
转换为data.frame
,然后使用原始向量&#39; x&#39;转换为cbind
。
cbind(x, as.data.frame(do.call(rbind,strsplit(x, '(?<=^.{2})\\s+', perl=TRUE)),
stringsAsFactors=FALSE))
# x V1 V2
#1 WV West Virginia WV West Virginia
#2 FL Florida FL Florida
#3 CA California CA California
#4 SC South Carolina SC South Carolina
或者代替正则表达式的外观,我们可以将stri_split
与n=2
一起使用并在空白处拆分。
library(stringi)
cbind(x,as.data.frame(do.call(rbind,stri_split(x, regex='\\s+', n=2))))
答案 5 :(得分:0)
这是 data.table / gsub
方法:
x <- c('WV West Virginia','FL Florida','CA California','SC South Carolina')
data.table::data.table(x)[,
abb := gsub("(^[A-Z]{2})( .+)", "\\1", x)][,
state := gsub("(^[A-Z]{2})( .+)", "\\2", x)][]
## x abb state
## 1: WV West Virginia WV West Virginia
## 2: FL Florida FL Florida
## 3: CA California CA California
## 4: SC South Carolina SC South Carolina