从单个列创建不同格式*的多个列*并清理结果

时间:2016-05-04 15:30:11

标签: r split dplyr

正如标题所暗示的那样,这个问题是对这个标题为question的同样的后续行动。在那里,我询问了如何使用分隔符_在多个数字列中拆分数据框的字符列,并清理结果。在这种情况下,所有列都是数字,并且它们是从拆分列的以下元素创建的,因此解决方案更容易。这一次,事情有点不同:

foo <- data.frame(Point.Type = c("Zero Start","Zero Start", "Zero Start", "3000rpm_10%_13barG_Sdsdsa_1.0_F_Pww","3000rpm_10%_13barG_Sdsdsa_1.0_F_Pww","3000rpm_10%_13barG_Sdsdsa_1.0_R_Pww","Zero Stop","Zero Start"),
               Point.Value = c(NA,NA,NA,rnorm(3),NA,NA))

Point.Type列,我需要创建四列,rpmGVFp0Setup

  • rpmGVFp0必须是numericinteger类型,而Setup必须是character类型}。
  • 如果NA不包含Point.Type(我的示例中的第1,2,3,7,8行),则必须将所有四个新列设置为_
  • 如果Point.Type确实包含_,则rpmGVFp0必须包含Point.Type的前三个元素,“从所有非数字字符清理。如果Setup的第6个元素等于Full,则Point.Type必须等于F,否则它必须等于Reduced。在我的示例中,这意味着Setup对于第4行和第5行应该等于Full,对于第6行应该等于Reduced

要获得三个numeric列,我使用@Procrastinatus_Maximus的优秀解决方案,稍作改动:

library(dplyr)
foo <- foo %>%
  separate(Point.Type, c("rpm", "GVF", "p0"), 
           sep="_", remove = FALSE, extra="drop", fill="right") %>%
  mutate_each(funs(as.numeric(gsub("[^0-9]","",.))), rpm, GVF, p0) 

现在,问题是character列,Setup。只是天真地写作

library(dplyr)
foo <- foo %>%
  separate(Point.Type, c("rpm", "GVF", "p0","Setup"), 
           sep="_", remove = FALSE, extra="drop", fill="right") %>%
  mutate_each(funs(as.numeric(gsub("[^0-9]","",.))), rpm, GVF, p0,Setup) 

无效,因为Setup的值与Point.Type之后的p0元素无关。此外,Setup的值取决于Point.Type的第6个元素是F还是R,但这些是character值,它们只是被扫除来自mutate_each(funs(as.numeric(gsub("[^0-9]","",.))),...。我得到了这个代码:

library(dplyr)
foo <- foo %>%
  separate(Point.Type, c("rpm", "GVF", "p0"), 
           sep="_", remove = FALSE, extra="drop", fill="right") %>%
  mutate_each(funs(as.numeric(gsub("[^0-9]","",.))), rpm, GVF, p0) 
library(stringr)
foo$Setup <- ifelse(str_split_fixed(setup$Point.Type,"_",7)[,6]=="F",
                                 "Full","Reduced") 

给了我

                           Point.Type  rpm GVF p0 Point.Value   Setup
1                          Zero Start   NA  NA NA          NA Reduced
2                          Zero Start   NA  NA NA          NA Reduced
3                          Zero Start   NA  NA NA          NA Reduced
4 3000rpm_10%_13barG_Sdsdsa_1.0_F_Pww 3000  10 13   1.9188554    Full
5 3000rpm_10%_13barG_Sdsdsa_1.0_F_Pww 3000  10 13  -0.5743683    Full
6 3000rpm_10%_13barG_Sdsdsa_1.0_R_Pww 3000  10 13  -0.7122796 Reduced
7                           Zero Stop   NA  NA NA          NA Reduced
8                          Zero Start   NA  NA NA          NA Reduced

但是,正如您所看到的那样,它仍然不起作用:SetupReduced等于NA的情况下也等于stringr。另外,坦率地说,我不喜欢仅仅为了创建Setup而加载dplyr的想法。我更喜欢在dplyr中完成所有工作,最好是使用管道的一行代码。如果这会导致代码无法读取,那么对 Dim this As String = Trim$(Mid$(TextBox1.Text, InStr(TextBox1.Text, "&") + 1)) Dim oldtxt As String = TextBox1.Text If InStr(TextBox1.Text, "&") > 0 Then TextBox1.Text = TextBox1.Text.Replace(TextBox1.Text, "End Date Copied" & this) Clipboard.SetText(this) Threading.Thread.Sleep(2000) TextBox1.Text = TextBox1.Text.Replace(TextBox1.Text, oldtxt) End If 的两次连续调用也可以。

1 个答案:

答案 0 :(得分:2)

这是我的尝试。我想这就是你所要求的。我带了你最后一个例子,并在链的末尾添加了一个变异。

library(dplyr)
library(tidyr)

foo <- data.frame(Point.Type = c("Zero Start","Zero Start", "Zero Start", "3000rpm_10%_13barG_Sdsdsa_1.0_F_Pww","3000rpm_10%_13barG_Sdsdsa_1.0_F_Pww","3000rpm_10%_13barG_Sdsdsa_1.0_R_Pww","Zero Stop","Zero Start"),
                  Point.Value = c(NA,NA,NA,rnorm(3),NA,NA))

res <- foo %>%
  separate(Point.Type, c("rpm", "GVF", "p0"), 
           sep="_", remove = FALSE, extra="drop", fill="right") %>%
  mutate_each(funs(as.numeric(gsub("[^0-9]","",.))), rpm, GVF, p0) %>%
  mutate(Setup = ifelse(!is.na(rpm), ifelse(grepl("_F_", Point.Type),"Full", "Reduced"),NA))