提取单词仅包含给定字符串中的字母数字

时间:2011-03-19 17:08:32

标签: c# .net regex

有人可以告诉我如何从以下产品名称中提取“型号名称”。作为一个例子,我需要的是从“Bosch SGS45A08GB Silver Dishwasher”中提取“SGS45A08GB”。好像我必须创建Regex来识别具有给定字符串的Alphanumric值的单词。有人可以给我一些c#示例来完成这项工作。

一些带有模型名称的示例字符串:

Bosch SGS45A08GB Silver Dishwasher
        Bosch Avantixx SGS45A02GB Dishwasher, White
        Bosch SMS53E12GB White Dishwasher
        Bosch SGS45A08GB Dishwashers
        BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
        Bosch SKS60E02GB Compact Dishwasher, White
        BOSCH SRV43M03GB Slimline Integrated Dishwasher
        BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
        BOSCH SGS45A02GB Dishwashers
        Bosch 18V Cordless Drill Driver
        Bosch PSB 18V Li-Ion Hammer Drill
        Bosch SGS45A08GB Dishwasher
        Bosch SGS45A08 12Place Full Size Dishwasher in Silver

编辑:添加更多产品名称

    Hitachi DH24DVC 4kg Cordless SDS Plus Hammer Drill 24V
    DeWalt DW965K 12V Angled Drill Driver
    Grove Modern Bathroom Suite with Acrylic Bath
    Bosch GBH24V 3.2kg SDS Plus Drill 24V
    Makita LS0714/1 190mm Sliding Compound Mitre Saw 110V
    Grove Modern Bathroom Suite with Steel Bath
    Swann All-in-One Monitoring & Recording Kit with LCD
    Makita BHR202RFE LXT 3.2kg SDS+ Rotary Hammer Drill 18V
    DeWalt DW625EK-GB 2000W Router 240V
    Trade Triple-Extension Ladder ELT340
    Makita 6391DWPE3 18V Drill Driver
    Erbauer ERF298MSW 165mm Sliding Compound Mitre Saw 24V

2 个答案:

答案 0 :(得分:3)

如果将“alphanumeric”定义为包含ASCII大写字母和数字的字符串,并且假设模型名称的最小长度(假设为8个字符),则可以使用示例中的所有名称进行匹配

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=[A-Z]*[0-9])  # assert presence of at least one ASCII digit
    (?=[0-9]*[A-Z])  # assert presence of at least one ASCII letter
    [0-9A-Z]{8,}     # match at least 8 characters
    \b               # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    // matched text: matchResults.Value
    // match start: matchResults.Index
    // match length: matchResults.Length
    matchResults = matchResults.NextMatch();
} 

我认为大写的ASCII字母和数字是模型名称的合理假设,但如果这不正确,您需要向我们展示更多示例。

修改  使用您的新示例,以下正则表达式可以正常工作,但约束变得越来越宽松,您可能永远找不到可靠地匹配所有可能的模型名称的正则表达式。

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=\S*[0-9])   # assert presence of at least one ASCII digit
    (?=\S*[A-Z])   # assert presence of at least one ASCII letter
    [0-9A-Z/-]{6,} # match at least 6 characters
    \b             # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);

答案 1 :(得分:0)

老兄,这是我能做的最好的。请注意,某些项目没有任何型号:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication3 {
    class Program {
        static void Main(string[] args) {
            string _data = @"Bosch SGS45A08GB Silver Dishwasher
            Bosch Avantixx SGS45A02GB Dishwasher, White
            Bosch SMS53E12GB White Dishwasher
            Bosch SGS45A08GB Dishwashers
            BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
            Bosch SKS60E02GB Compact Dishwasher, White
            BOSCH SRV43M03GB Slimline Integrated Dishwasher
            BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
            BOSCH SGS45A02GB DishwashersBosch 18V Cordless Drill Driver
            Bosch PSB 18V Li-Ion Hammer Drill
            Bosch SGS45A08GB Dishwasher
            Bosch SGS45A08 12Place Full Size Dishwasher in Silver";

            Regex _expression = new Regex(@"\p{Lu}{3}\d+\w+\s+");
            foreach (Match _match in _expression.Matches(_data)) {
                Console.WriteLine(_match.Value);
            }
            Console.ReadKey();
        }
    }
}