我有一个数据框,df
有17列和60行。我的前两列和四行如下所示:
Well and Depth Mean
Black Peak 1000 500
Black Peak 1001 600
Black Peak 1002 700
Black Peak 1003 800
我的第一个“Well and Depth”专栏目前是一系列因素。我想在“Well and Depth”和“Mean”列之间插入两个新列。我希望我的代码在数字(即)“1001”,“10002”,“10003”......等)之前自动提取文本,并分别从“Well and Depth”列中提取数字值并将其插入到新创建的列,我将其称为“Well Name”和“Depth”。所有在数字前面的空格之前的文本将进入“Well Name”列,并且数字将进入“深度”列。前四列最终看起来像这样:
Well and Depth Well Name Depth Mean
"Black Peak 1000" "Black Peak" 1000 500
"Black Peak 1001" "Black Peak" 1001 600
"Black Peak 1002" "Black Peak" 1002 700
"Black Peak 1003" "Black Peak" 1003 800
这将涉及比这里所示的4行更大的数据集,所以理想情况下我想避免在脚本中使用长文本向量。
答案 0 :(得分:1)
使用上一个问题的答案Data Frame of Factors: Split column into two and extract number作为起点:
#the data
df<-read.table(header = TRUE, text="WellandDepth Mean
'Black Peak 1000' 500
'Black Peak 1001' 600
'Black Peak 1002' 700
'Black Peak 1003' 800")
#split Well and Depth column
HERE=data.frame(WELL=character(),DEPTH=numeric())
HERE<-strcapture("(.*)\\s(\\d+)$",as.character(df[,1]),HERE)
#paste it all back together
answer<-data.frame('Well and Depth'=df[,1], HERE, Mean=df[,2])
答案 1 :(得分:1)
这是dplyr
library(dplyr)
library(magrittr)
# This generates your data frame, "check.names" allows spaces in names
df <- data.frame("Well and Depth"= c("Black Peak 1000",
"Black Peak 1001",
"Black Peak 1002",
"Black Peak 1003"),
"Mean"= c(500, 600, 700, 800),
check.names = FALSE)
# Convert to factor
df$`Well and Depth` <- as.factor(df$`Well and Depth`)
# Generate required features
df %<>%
mutate(`Well Name` = gsub("(.*\\s)\\d+$", "\\1", `Well and Depth`)) %>%
mutate(Depth = gsub(".*\\s(\\d+)$", "\\1", `Well and Depth`))