从数据帧中提取字符串的不同部分

时间:2018-02-12 16:19:38

标签: r

我有一个数据框system("pause");,格式如下:元素数据类型是字符。

df

我希望将这些数据提取为两部分:第一部分是字符串中的最后一个数字,以及数字前面的空格之前的所有文本。此外,当提取数字时,我怎么能将字符转换为可用的整数?我打算将提取的数据保留在数据框中。完成后看起来如下:

 Well and Depth  
   Black Peak 1000
   Black Peak 1001
   Black Peak 1002
   Black Peak 10150
   Black Peak 10151  

上面的两个列表是数据框 Well Depth Black Peak 1000 Black Peak 1001 Black Peak 1002 Black Peak 10150 Black Peak 10151

中的两列

2 个答案:

答案 0 :(得分:0)

stringr https://www.rdocumentation.org/packages/stringr/versions/1.1.0/topics/str_split)尝试 str_split(),然后将第二列转换为数字,例如 as.numeric()

答案 1 :(得分:0)

数据

# example dataset
df = data.frame(v = c("Black Peak 1000", "Black Peak 1001", "Black Peak 1002", 
                      "Black Peak 10150", "Black Peak 10151"), stringsAsFactors = F)

使用基础R

# split by last space, bind rows and save it as dataframe
df2 = data.frame(do.call(rbind, strsplit(df$v, ' (?=[^ ]+$)', perl=TRUE)), stringsAsFactors = F)

# set names
names(df2) = c("Well", "Depth")

# update to numeric 
df2$Depth = as.numeric(df2$Depth)

df2

#         Well Depth
# 1 Black Peak  1000
# 2 Black Peak  1001
# 3 Black Peak  1002
# 4 Black Peak 10150
# 5 Black Peak 10151

或使用tidyverse方法

library(tidyverse)

df %>% 
  separate(v, sep = ' (?=[^ ]+$)', into = c("Well","Depth")) %>%
  mutate(Depth = as.numeric(Depth))

#         Well Depth
# 1 Black Peak  1000
# 2 Black Peak  1001
# 3 Black Peak  1002
# 4 Black Peak 10150
# 5 Black Peak 10151