考虑一下我正在使用的非常混乱的数据集的最小工作示例:
library(dplyr)
library(tidyr)
x<- paste(sort(rep(LETTERS[1:4], 3)), paste0(rep("#", 3), rep(11:13, 3)))
y<- paste(sort(rep(LETTERS[1:4], 2)), paste0(rep(1:2, 2), rep("/0", 2)))
data<- data.frame(Item = c(x, y))
给出:
Item
1 A #11
2 A #12
3 A #13
4 B #11
5 B #12
6 B #13
7 C #11
8 C #12
9 C #13
10 D #11
11 D #12
12 D #13
13 A 1/0
14 A 2/0
15 B 1/0
16 B 2/0
17 C 1/0
18 C 2/0
19 D 1/0
20 D 2/0
我想将项目分为项目和大小。尺码有两种。第一个11:13,由#
标识。在本示例中,第二个1/0:2/0可以由/0
标识。为了从项目data %>% separate(Item, into = c("Item", "Size"), sep = "#")
中分离出第一尺寸类型。但是,这会在第13:20行输出NA
。
如何根据条件分离变量,以使第二种尺寸类型的项目和尺寸可以分开?
我尝试了下面的代码,但没有成功。
data %>%
separate(Item, into = c("Item", "Size"), sep = "#") %>%
mutate(ifelse(grepl("/0", Item) == TRUE, separate(Item, into = c("Item", "Size"), sep = " (?=[^ ]+$)", perl=TRUE), Size))
编辑
所需的输出应如下所示:
Item Size
1 A 11
2 A 12
3 A 13
4 B 11
5 B 12
6 B 13
7 C 11
8 C 12
9 C 13
10 D 11
11 D 12
12 D 13
13 A 1/0
14 A 2/0
15 B 1/0
16 B 2/0
17 C 1/0
18 C 2/0
19 D 1/0
20 D 2/0
答案 0 :(得分:1)
To answer your question the | operator lets you select multiple separators.
data %>%
separate(Item, into = c("Item", "Size"), sep = " #| ")
Or you could use the common " " character to split everything and then clean up the column after:
data %>%
separate(Item, into = c("Item", "Size"), sep = " ")
See https://stringr.tidyverse.org/articles/regular-expressions.html for more regex info to help your cleaning. If it's untidy text you're gonna love and need stringR
答案 1 :(得分:0)
I think this may be what you are looking for. Split on the space and then replace either # or /0 with blank, unless I misunderstood.
data %>%
separate(Item, into = c("Item", "Size"), sep = " ") %>%
mutate(Size = gsub("/0|#", "", Size))
答案 2 :(得分:0)
由于JdbcTemplate#batchUpdate
的格式为<Size
,且数字>或空格后的数字,因此将转到#
参数。
sep
找到诸如" #(?=[0-9])"
" #1"
找到诸如" [0-9]"
" 1"
的意思是或 总而言之,(假设这些样式不在商品名称中出现 )
|