将因子的数据框列转换为字符串和数字的两个可用列

时间:2018-02-08 20:33:41

标签: r

我有一个数据框,df有17列和60行。我的前两列和四行如下所示:

  Well and Depth    Mean 
   Black Peak 1000    500
   Black Peak 1001    600
   Black Peak 1002    700
   Black Peak 1003    800

我的第一个“Well and Depth”专栏目前是一系列因素。我想在“Well and Depth”和“Mean”列之间插入两个新列。我希望我的代码在数字(即)“1001”,“10002”,“10003”......等)之前自动提取文本,并分别从“Well and Depth”列中提取数字值并将其插入到新创建的列,我将其称为“Well Name”和“Depth”。所有在数字前面的空格之前的文本将进入“Well Name”列,并且数字将进入“深度”列。前四列最终看起来像这样:

  Well and Depth        Well Name      Depth      Mean 
   "Black Peak 1000"     "Black Peak"     1000     500
   "Black Peak 1001"     "Black Peak"     1001     600
   "Black Peak 1002"     "Black Peak"     1002     700
   "Black Peak 1003"     "Black Peak"     1003     800

这将涉及比这里所示的4行更大的数据集,所以理想情况下我想避免在脚本中使用长文本向量。

2 个答案:

答案 0 :(得分:1)

使用上一个问题的答案Data Frame of Factors: Split column into two and extract number作为起点:

#the data
df<-read.table(header = TRUE, text="WellandDepth    Mean 
   'Black Peak 1000'    500
   'Black Peak 1001'    600
   'Black Peak 1002'    700
   'Black Peak 1003'    800")

#split Well and Depth column
HERE=data.frame(WELL=character(),DEPTH=numeric())
HERE<-strcapture("(.*)\\s(\\d+)$",as.character(df[,1]),HERE)

#paste it all back together
answer<-data.frame('Well and Depth'=df[,1], HERE, Mean=df[,2])

答案 1 :(得分:1)

这是dplyr

的替代选项
library(dplyr)
library(magrittr)    

# This generates your data frame, "check.names" allows spaces in names
df <- data.frame("Well and Depth"= c("Black Peak 1000",
                                    "Black Peak 1001",
                                    "Black Peak 1002",
                                    "Black Peak 1003"),
                 "Mean"= c(500, 600, 700, 800),
                 check.names = FALSE)
# Convert to factor
df$`Well and Depth` <- as.factor(df$`Well and Depth`)

# Generate required features   
df %<>%
 mutate(`Well Name` = gsub("(.*\\s)\\d+$", "\\1", `Well and Depth`)) %>%
 mutate(Depth = gsub(".*\\s(\\d+)$", "\\1", `Well and Depth`))