有条件地分隔变量dplyr

时间:2019-01-15 17:32:13

标签: r dplyr tidyr

考虑一下我正在使用的非常混乱的数据集的最小工作示例:

library(dplyr)
library(tidyr)

x<- paste(sort(rep(LETTERS[1:4], 3)), paste0(rep("#", 3), rep(11:13, 3)))
y<- paste(sort(rep(LETTERS[1:4], 2)), paste0(rep(1:2, 2), rep("/0", 2)))
data<- data.frame(Item = c(x, y))

给出:

    Item
1  A #11
2  A #12
3  A #13
4  B #11
5  B #12
6  B #13
7  C #11
8  C #12
9  C #13
10 D #11
11 D #12
12 D #13
13 A 1/0
14 A 2/0
15 B 1/0
16 B 2/0
17 C 1/0
18 C 2/0
19 D 1/0
20 D 2/0

我想将项目分为项目和大小。尺码有两种。第一个11:13,由#标识。在本示例中,第二个1/0:2/0可以由/0标识。为了从项目data %>% separate(Item, into = c("Item", "Size"), sep = "#")中分离出第一尺寸类型。但是,这会在第13:20行输出NA

如何根据条件分离变量,以使第二种尺寸类型的项目和尺寸可以分开?

我尝试了下面的代码,但没有成功。

data %>% 
        separate(Item, into = c("Item", "Size"), sep = "#") %>% 
        mutate(ifelse(grepl("/0", Item) == TRUE, separate(Item, into = c("Item", "Size"), sep = " (?=[^ ]+$)", perl=TRUE), Size))

编辑

所需的输出应如下所示:

   Item Size
1     A   11
2     A   12
3     A   13
4     B   11
5     B   12
6     B   13
7     C   11
8     C   12
9     C   13
10    D   11
11    D   12
12    D   13
13    A  1/0
14    A  2/0
15    B  1/0
16    B  2/0
17    C  1/0
18    C  2/0
19    D  1/0
20    D  2/0

3 个答案:

答案 0 :(得分:1)

To answer your question the | operator lets you select multiple separators.

data %>% 
  separate(Item, into = c("Item", "Size"), sep = " #| ")

Or you could use the common " " character to split everything and then clean up the column after:

data %>% 
      separate(Item, into = c("Item", "Size"), sep = " ")

See https://stringr.tidyverse.org/articles/regular-expressions.html for more regex info to help your cleaning. If it's untidy text you're gonna love and need stringR

答案 1 :(得分:0)

I think this may be what you are looking for. Split on the space and then replace either # or /0 with blank, unless I misunderstood.

data %>%
  separate(Item, into = c("Item", "Size"), sep = " ") %>%
  mutate(Size = gsub("/0|#", "", Size))

答案 2 :(得分:0)

由于JdbcTemplate#batchUpdate的格式为<Size,且数字>或空格后的数字,因此将转到#参数。

  1. sep找到诸如" #(?=[0-9])"
  2. 之类的模式
  3. " #1"找到诸如" [0-9]"
  4. 之类的模式
  5. " 1"的意思是

总而言之,(假设这些样式不在商品名称中出现

|