将值与R字符串中的字符分开

时间:2018-03-15 16:27:41

标签: r string data-manipulation tidyr

我有这个问题。我的数据集 a 包含一个格式错误的列,其中包含字符,字母和标点符号。我想将 num text 两列中的 Unit_Wrong 列分开。

这是数据集a:

a <- data.frame(Measure = c(10000, 2000, 10000, 15000, 40000, 0), 
                Unit_Wrong = c("10L","25.5mL","30.5 mL","40OUNCES","3X", "NO_SIZE"), 
                stringsAsFactors = FALSE)

我的预期结果是 b

b <- data.frame(Measure = c(10000, 2000, 10000, 15000, 40000, 0), 
                Unit_Wrong = c("10L","25.5mL","30.5 mL","40OUNCES","3X", "NO_SIZE"), 
                text = c("L", "mL", "ml", "OUNCES", "X", "NO_SIZE"),
                num = c("10","25.5","30.5","40","3", ""),
                stringsAsFactors = FALSE) 

我试过这个,但它不起作用:

attempt <- a %>% 
  mutate(text = gsub("[[:digit:]]","", Unit_Wrong)) %>%
  mutate(num = str_replace_all(Unit_Wrong, text, ""))

你能帮忙吗?

2 个答案:

答案 0 :(得分:3)

    a %>%
mutate(text = stringr::str_extract(Unit_Wrong,"[A-z]+$")) %>%
mutate(num  = stringr::str_extract(Unit_Wrong,"(\\d\\.?)+") %>% as.numeric)

输出:

  Measure Unit_Wrong    text  num
1      10        10L       L   10
2    2000     25.5mL      mL 25.5
3   10000    30.5 mL      mL 30.5
4      15   40OUNCES  OUNCES   40
5      40         3X       X    3
6       0    NO_SIZE NO_SIZE <NA>

注意:

如果你有像“μ”等单位的特殊字符,你需要添加它们 在[A-z][A-zµ],依此类推。

答案 1 :(得分:1)

这是使用gsub

的R基本解决方案
> text <- gsub("\\d*\\s*\\.*", "", a$Unit_Wrong)
> num <- as.numeric(gsub("\\s*[[A-Za-z]]*_*", "", a$Unit_Wrong))
> data.frame(a, text, num)
  Measure Unit_Wrong    text  num
1   10000        10L       L 10.0
2    2000     25.5mL      mL 25.5
3   10000    30.5 mL      mL 30.5
4   15000   40OUNCES  OUNCES 40.0
5   40000         3X       X  3.0
6       0    NO_SIZE NO_SIZE   NA