我的数据结构如下:
df <- data.frame(Athlete = c('02 Paul Jones', '02 Paul Jones', '02 Paul Jones', '02 Paul Jones',
'02 Paul Jones', '02 Paul Jones', '02 Paul Jones', '02 Paul Jones',
'01 Joe Smith', '01 Joe Smith', '01 Joe Smith', '01 Joe Smith',
'01 Joe Smith', '01 Joe Smith', '01 Joe Smith', '01 Joe Smith'),
Period = c('P1', 'P1', 'P1', 'P1',
'P2', 'P2', 'P2', 'P2',
'P1', 'P1', 'P1', 'P1',
'P2', 'P2', 'P2', 'P2'))
# Make `Athlete` column a character
df$Athlete <- as.character(df$Athlete)
如何在保留名字和名字之间的空格的同时提取每位运动员的名字和姓氏?我不希望领先的空间包括其中之一。例如,"Paul Jones"
不是" Paul Jones"
答案 0 :(得分:2)
使用正则表达式模式的POSIX语言环境类型解释删除除字母[:alpha:]
和空格字符[:space:]
之外的所有内容。
df$Athlete <- as.character(df$Athlete) # convert factor to character
df$Athlete <- gsub("[^[:alpha:][:space:]]", '', df$Athlete)
df$Athlete <- gsub("^[[:space:]]+", '', df$Athlete ) # removing leading spaces
head(df)
# Athlete Period
# 1 Paul Jones P1
# 2 Paul Jones P1
# 3 Paul Jones P1
# 4 Paul Jones P1
# 5 Paul Jones P2
# 6 Paul Jones P2
答案 1 :(得分:1)
我们可以使用sub
来匹配一个或多个数字([0-9]+
),然后匹配字符串开头(\\s+
)的一个或多个空格(^
)并将其替换为""
df$Athlete <- sub("^[0-9]+\\s+", "", df$Athlete)
df
# Athlete Period
#1 Paul Jones P1
#2 Paul Jones P1
#3 Paul Jones P1
#4 Paul Jones P1
#5 Paul Jones P2
#6 Paul Jones P2
#7 Paul Jones P2
#8 Paul Jones P2
#9 Joe Smith P1
#10 Joe Smith P1
#11 Joe Smith P1
#12 Joe Smith P1
#13 Joe Smith P2
#14 Joe Smith P2
#15 Joe Smith P2
#16 Joe Smith P2