Question

我希望从下面的列中提取所有数字细节

head(df$Session, 5)
[1] "Session_01122016" "Session_02122016" "Session_03122016" "Session_04122016" "Session_05122016"

head(df$Date, 5)
    [1] "01/12/2016" "02/12/2016" "03/12/2016" "04/12/2016" "05/12/2016"

我的预期输出是：

head(df$SessionOutput, 5)
[1] "01122016" "02122016" "03122016" "04122016" "05122016"

head(df$DateOutput, 5)
    [1] "01122016" "02122016" "03122016" "04122016" "05122016"

有可能请这样做吗？

谢谢。

Answer 1

如果每个列中的模式一致，您只需使用gsub()删除不需要的模式：

df <- data.frame(
  Session = c("Session_01122016","Session_02122016","Session_03122016","Session_04122016","Session_05122016"),
  Date = c("01/12/2016","02/12/2016","03/12/2016","04/12/2016","05/12/2016"),
  stringsAsFactors = F
)

df$SessionOutput <- gsub("Session_", "", df$Session)
df$DateOutput <- gsub("/", "", df$Date, fixed = T)

> head(df$SessionOutput )
[1] "01122016" "02122016" "03122016" "04122016" "05122016"
> head(df$DateOutput )
[1] "01122016" "02122016" "03122016" "04122016" "05122016"

Answer 2

您可以使用gsub：

x <- c("01/12/2016", "02/12/2016", "03/12/2016", "04/12/2016", "05/12/2016")
y <- c("Session_01122016", "Session_02122016", "Session_03122016", "Session_04122016", "Session_05122016")

# defines a pattern to be replaced with an empty string
# here, anything that is a punctuation sign or alphabetic character
remove_this <- "[[:punct:]]|[[:alpha:]]"

gsub(remove_this, "", x)
[1] "01122016" "02122016" "03122016" "04122016" "05122016"

gsub(remove_this, "", y)
[1] "01122016" "02122016" "03122016" "04122016" "05122016"

?gsub和?regex会有所帮助。

Answer 3

您可以使用stringi包

  lapply(df,function(x)stri_c_list(stri_extract_all(x,regex = '[0-9]')))
 $Session
 [1] "01122016" "02122016" "03122016" "04122016" "05122016"

 $Date
 [1] "01122016" "02122016" "03122016" "04122016" "05122016"

从R

3 个答案: