我正在使用具有尺寸说明列的零售数据集。我的任务是清理列并将数字大小与字符串中的字符分开。有没有办法通过正则表达式来做到这一点?我需要将列中存在的数字和任何其他字符串保存在两个不同的列中。
对数据的观察:
谢谢!
答案 0 :(得分:1)
这是多个案例的正则表达式 它适用于示例。
details <- c("EU 36", "UK 8", "19 Wide", "10 Kids", "19(-25F)", "XXS", "XS is Extra Small", "S", "M", "L", "XL", "XXL", "XXXL", "2XL", "32")
pattern = "\\b(?:(?:(?:2?X*(?:S|L))|M|(?:EU|UK) [0-9]+)|(?:[0-9]{2}(?: (?:Kids|Wide))?))\\b"
matches <- regexpr(pattern, details)
regmatches(details, matches)
正则表达式的细分:
\b # Word boundary: a position between a word and non-word character
# (includes the start/end of the line).
(?: # a non-capturing group
(?: # ditto
(?: # ditto
2? # 0 or 1 "2" characters
X* # 0 or more "X" characters
(?:S|L) # "S" or an "L" character
)
| # or
M # the "M" character
| # or
(?:EU|UK) [0-9]+ # "EU" or "UK", followed by a space and 1 or more digits
| # or
(?:[0-9]{2}(?: (?:Kids|Wide))? # 2 digits optionally followed by " Kids" or " Wide"
)
)
\b # Word boundary