Question

我正在寻找帮助，以便能够（使用REGEX）从下面列出的文本元素中提取模型。

2007本田CR-V LX CLEAN !!
2008 Honda Accord EX ROOF MAGS CLEAN 1 OWNER
2008 Honda Civic EX-L CUIR TOIT LEATHER
2009丰田卡罗拉S完全装备

常数因素是，
该模型始终是第三个单词

提前感谢您的帮助。

Answer 1

我会与正则表达式\d{4}匹配以获取第一个4位数字（年份），然后按空格（使用您使用的任何语言）将其拆分，然后从中获取第2个和第3个单词

你甚至可以将它从空格中分开并使用它，例如在Ruby中：

array=my_name.split(" ")
year=array[0]
make=array[1]
model=array[2]

基本上我认为正则表达式不是最好的解决方案。

Answer 2

如果你必须使用正则表达式，那就是

^(\d{4}) +([^ ]+) +([^ ]+) +(.*)$

\ 1然后是年份，\ 2表示，\ 3表示模型，\ 4表示其余部分。但是，如果有任何带有两个单词的模型（比如维多利亚皇冠），这将无效，除非您将单词与空格以外的其他单词分开（例如Crown_Victoria）。

Answer 3

试试这个简单的：

(\d+)\s*(\w+)\s*(.+)

并获得小组。

解释

\d+        digits (0-9) 
           (1 or more times, matching the most amount possible)

\s*        whitespace (\n, \r, \t, \f, and " ") 
           (0 or  more times, matching the most amount possible)

\w+        word characters (a-z, A-Z, 0-9, _) 
           (1 or more times, matching the most amount possible)

.+         any character except \n 
           (1 or more times, matching the most amount possible)

Answer 4

请检查此链接：Regex Implementation

([0-9]*).\b([a-zA-z]*).\b([a-zA-z-.]*).\b(.*)

您将获得3组：

2007
本田
CR-V

修改

如果您使用的是c＃语言，那么这就是获取model
的方法
string page = "2007 Honda CR-V LX CLEAN !!"; Regex reg = new Regex(@"(?<year>[0-9]*).\b(?<make>[a-zA-z]*).\b(?<model>[a-zA-z-.]*).\b(?<rest>.*)"); MatchCollection mc = reg.Matches(page); foreach (Match m in mc) { MessageBox.Show(m.Groups["model"]); }

如何使用Regex从文本元素中基于定位来拉取单词？

4 个答案: