将可变产品代码拆分为字母和数字

时间:2018-06-01 05:25:40

标签: r split

我有一个产品代码变量,如:

Product Code
RMMI001,
RMMI001,
CMCM009,
ASCMOT064,
ASPMOA023,
CMCM009,
CMCM012,
CMCM001,
ASCMBW001,
RMMI001,
TMHO002,
TMSP001,
TMHO002,
TMDMST003

我需要拆分它们,并在另一列中需要这些字符。

3 个答案:

答案 0 :(得分:1)

您可以尝试在此处使用sub删除所有尾随数字,并为您留下字符部分:

df <- data.frame(product_code=c("RMMI001", "RMMI001", "CMCM009"))
df$code <- sub("\\d*$", "", df$product_code)
df

  product_code code
1      RMMI001 RMMI
2      RMMI001 RMMI
3      CMCM009 CMCM

Demo

答案 1 :(得分:0)

这样的事情怎么样?

# Sample product codes
ss <- c("RMMI001", "RMMI001", "CMCM009", "ASCMOT064", "ASPMOA023", "CMCM009", "CMCM012", "CMCM001", "ASCMBW001", "RMMI001", "TMHO002", "TMSP001", "TMHO002", "TMDMST003")

# Separate code and numbers and store in data.frame
read.csv(text = gsub("^([a-zA-Z]+)(\\d+)$", "\\1,\\2", ss), header = F)
#       V1 V2
#1    RMMI  1
#2    RMMI  1
#3    CMCM  9
#4  ASCMOT 64
#5  ASPMOA 23
#6    CMCM  9
#7    CMCM 12
#8    CMCM  1
#9  ASCMBW  1
#10   RMMI  1
#11   TMHO  2
#12   TMSP  1
#13   TMHO  2
#14 TMDMST  3

答案 2 :(得分:0)

您也可以使用channel,它仅适用于数据框。

tidyr::extract

<强>输出:

tidyr::extract(data.frame(x  =c("RMMI001", "CMCM009")),x, c("first", "second"), "([a-zA-Z]+)(\\d+)" )

如果您选择“([a-zA-Z] +)\ d +”而不是“([a-zA-Z] +)(\ d +)”,这将在单独的列中提取字母和数字。然后它将仅提取表示为英语单词的第一个匹配,如下所示。注意这里的区别是括号表示的捕获组。这里用于捕获匹配,在这种情况下,这些是单词和数字到单独的列。

#  first second
#1  RMMI    001
#2  CMCM    009