我有一个数据集,其中的列名称之一是“名称”,其中包含产品名称,包括产品的数量(尺寸),如下所示。
Alkabeer Paratha Plain 400 GM
Almarai Fresh Laban Baladi 2 L
Americana Breaded Chicken Burger 1 KG
Dac Glass Cleaner 4 L
Duru Body Soap Fruity 125 GM - 4 Pcs
Lux Liquid Handwash Soft Touch 250 ML
Lux Liquid Handwash Magical Beauty 250 ML
Lusine Sliced Bread Multi Grain 600 GM
Orinex Containers Bowl 25 Oz - 4 Pcs
Betty Crocker Frosting Vanilla 400 GM
Freshly Microwave Popcorn 3.5 Oz
Gandour Potato Chips 145 Gm
Galaxy Chocolate Milk 40 GM
Nahool Jumbo Roll Strawberry 75 GM - 6 Pcs
Nestle Sweetened Condensed Milk 397 GM
Puck Cheese Triangle Value Pack 120 GM - 5 Pcs
Betty Crocker Super Moist Cake Mix Choco Fudge 500 GM
某些产品包装在板条箱中,例如“ Duru香皂果味125 GM-4件”
我想提取箱子的数量和大小(如果不是箱子则为0)。
数量由GM,KG,ML,L,Oz定义,箱子的尺寸由Pcs确定
编辑:
我想添加更多示例,这些示例使Onyambu提到的过程变得复杂。
Signal Complete8 Actions White Toothpaste 120Ml
Fresh Plums Red Per KG
Blemil Plus Baby Milk #2 800 GM
7Up Drink Can 330 ML
Lipton Chai Latte 3 In 1 Classic 25.7 Gm - 7 Pcs
Lusine 6 Burger Buns Plain 400 GM
Farleys Baby Food 3 Fruits 120 GM
Clorox Regular + 40% Extra 3.7 L
Clorox 5 In 1 Disinfectant Cleaner Orange 3 L
Almarai Cheese 6 Portions 108 GM - 2+1 Pcs
3 Cow Feta Cheese Low Salt 200 GM
S-26 Pro Gold Baby Milk #1 900 GM
答案 0 :(得分:1)
library(tidyverse)
dat%>%mutate(s=gsub(".*?(\\d+.*)","\\1",V1))%>%
separate(s,c("quantity","crate_size")," - ",fill="right")%>%
replace_na(list(crate_size=0))
V1 quantity crate_size
1 Alkabeer Paratha Plain 400 GM 400 GM 0
2 Almarai Fresh Laban Baladi 2 L 2 L 0
3 Americana Breaded Chicken Burger 1 KG 1 KG 0
4 Dac Glass Cleaner 4 L 4 L 0
5 Duru Body Soap Fruity 125 GM - 4 Pcs 125 GM 4 Pcs
6 Lux Liquid Handwash Soft Touch 250 ML 250 ML 0
7 Lux Liquid Handwash Magical Beauty 250 ML 250 ML 0
8 Lusine Sliced Bread Multi Grain 600 GM 600 GM 0
9 Orinex Containers Bowl 25 Oz - 4 Pcs 25 Oz 4 Pcs
10 Betty Crocker Frosting Vanilla 400 GM 400 GM 0
11 Freshly Microwave Popcorn 3.5 Oz 3.5 Oz 0
12 Gandour Potato Chips 145 Gm 145 Gm 0
13 Galaxy Chocolate Milk 40 GM 40 GM 0
14 Nahool Jumbo Roll Strawberry 75 GM - 6 Pcs 75 GM 6 Pcs
15 Nestle Sweetened Condensed Milk 397 GM 397 GM 0
16 Puck Cheese Triangle Value Pack 120 GM - 5 Pcs 120 GM 5 Pcs
17 Betty Crocker Super Moist Cake Mix Choco Fudge 500 GM 500 GM 0
在Base R中执行此操作:
read.table(sep="-",text=gsub(".*?(\\d+.*)","\\1",dat$V1),fill=T,h=F,
col.names = c("Quantity","Crate_Size"),na.strings = "",strip.white = T)
Quantity Crate_Size
1 400 GM <NA>
2 2 L <NA>
3 1 KG <NA>
4 4 L <NA>
5 125 GM 4 Pcs
6 250 ML <NA>
7 250 ML <NA>
8 600 GM <NA>
9 25 Oz 4 Pcs
10 400 GM <NA>
11 3.5 Oz <NA>
12 145 Gm <NA>
13 40 GM <NA>
14 75 GM 6 Pcs
15 397 GM <NA>
16 120 GM 5 Pcs
17 500 GM <NA>