正则表达式帮助vCard导入

时间:2014-03-19 09:29:14

标签: regex vb.net vcard

问题

我正在尝试为VB.NET桌面应用程序(Visual Studio 2012)创建一个导入功能,该功能将分析vCard并在整个类中分发所有数据。已创建该类,并且除了name元素之外,还通过正则表达式正确分析数据。 下面是我正在使用的vCard文本(这是从Microsoft Outlook导出的)。

BEGIN:VCARD
VERSION:2.1
N;LANGUAGE=en-gb:Test;Johnny;Stewart;Mr.
FN:Mr. Johnny Stewart Test
ORG:Test Company
TITLE:Software Development
TEL;WORK;VOICE:01210000000
TEL;HOME;VOICE:01211111111
TEL;WORK;FAX:01212222222
ADR;WORK;PREF:;;10 Test St;Teston;Testville;T0 0TT;United Kingdom
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:10 Test St=0D=0A=
Teston=0D=0A=
Testville=0D=0A=
T0 0TT
X-MS-OL-DEFAULT-POSTAL-ADDRESS:2
URL;WORK:www.webpageaddress.co.uk
EMAIL;PREF;INTERNET:Johnny.Test@TestCo.co.uk
X-MS-IMADDRESS:example.IMAddress@webpageaddress.co.uk
X-MS-CARDPICTURE;TYPE=JPEG;ENCODING=BASE64:
 /9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAcFBQYFBAcGBQYIBwcIChELCgkJChUPEAwRGBUa
 GRgVGBcbHichGx0lHRcYIi4iJSgpKywrGiAvMy8qMicqKyr/2wBDAQcICAoJChQLCxQqHBgc
 KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKir/wAAR
 CACUACcDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAA
 AgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkK
 FhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWG
 h4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl
 5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREA
 AgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYk
 NOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOE
 hYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk
 5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2gOMYx+tOBHXFVg3Hf8qXf6gGkMsGTsDn
 /gVG8e4/4FVfzDjhBigyHHUj8aAJvNI7kj6UVX3jPUfnRQBEGJ5LH8Vpd59QaqCRc9c/WnBh
 6/rQInL56AUK2OuKi8z0Ofqc00sc/eFAE5f0/nRVcsfY/Q0UAQKx7Gl8zHcGoRJn1P0FBZv7
 2PqDQBKJAT0x9ad5hx1H51Bu4+9+tN389zQBPvJ9B+FFQE/5zRQBCXPf+dLv47VXDgH/ABpS
 /wBBQBNv98fjQJCe5P0NQh89efxoLL6/pQBLu9/zoqHPpiigCLcfb8qUNz2qDf7GnZHr+lAE
 pYetG7jjNRZI7fpTGc+lAE5bnkUVAHPfAooAj3j3o3jHWosg/wD6qNw+v4UAS7qXcfQ/hUBJ
 Pt+FAbtn9KAJt3PP60VDu9/1ooAjyexpdw71CT64pAwFAE2eKTdj/wDXUe8UoYHuaAJN5PUN
 RUJY0UAR0ob1qPPrn86Nw9aAJN2O9G4dzUe4duaTJoAl3j+8KKjz9aKAGfjSE+1R5HelDEdP
 5UAPD47Ub/eoyfUil3jFADi3qP1oqPdz1NFADec9TRRRTAWkyaKKAE3H1ooooA//2Q==

X-MS-OL-DESIGN;CHARSET=utf-8:<card xmlns="http://schemas.microsoft.com/office/outlook/12/electronicbusinesscards" ver="1.0" layout="left" bgcolor="ffffff"><img 

xmlns="" align="fit" area="16" use="cardpicture"/><fld xmlns="" prop="name" align="left" dir="ltr" style="b" color="000000" size="10"/><fld xmlns="" prop="org" align="left" 

dir="ltr" color="000000" size="8"/><fld xmlns="" prop="title" align="left" dir="ltr" color="000000" size="8"/><fld xmlns="" prop="blank" size="8"/><fld xmlns="" prop="email" 

align="left" dir="ltr" color="000000" size="8"/><fld xmlns="" prop="blank" size="8"/><fld xmlns="" prop="addrwork" align="left" dir="ltr" color="000000" size="8"/><fld xmlns="" 

prop="addrhome" align="left" dir="ltr" color="000000" size="8"/><fld xmlns="" prop="blank" size="8"/><fld xmlns="" prop="webhome" align="left" dir="ltr" color="000000" 

size="8"/><fld xmlns="" prop="webwork" align="left" dir="ltr" color="000000" size="8"/><fld xmlns="" prop="blank" size="8"/><fld xmlns="" prop="telwork" align="left" dir="ltr" 

color="000000" size="8"/><fld xmlns="" prop="telhome" align="left" dir="ltr" color="000000" size="8"/><fld xmlns="" prop="faxwork" align="left" dir="ltr" color="000000" 

size="8"/><fld xmlns="" prop="im" align="left" dir="ltr" color="000000" size="8"/></card>
REV:20140318T153016Z
END:VCARD

以下是我想要与正则表达式匹配的行(第3行):

N;LANGUAGE=en-gb:Test;Johnny;Stewart;Mr.

尝试

现在我在正则表达方面并不出色,但我确实使用在线备忘单。我接近了,但我现在感到有点沮丧,因为我觉得我已经尝试了一切。以下是我正在使用的正则表达式:

(\n(?<strElement>(N))) (;(?<strLang>(LANGUAGE)))* ([^:]*)*  (:(?<strSurname>([^;]*))) (;(?<strGivenName>([^;]*)))  ?(;(?<strMidName>([^\n|^;]*))) ?(;(?<strPrefix>([^\n]*))) ?(;(?<strSuffix>([^\n]*)))

这很接近但是它将前缀(在这种情况下为#34; Mr。&#34;)放入后缀组中,这显然是不正确的。

备注

  • 就我在vCards上所做的研究而言,我所看到的名称元素的语言部分可能是可选的(我想我已经在上面的正则表达式中为此提供了帮助)。
  • 如果缺少数据,例如后缀,则不会导出半冒号以指示空数据字段

摘要

如果有人能给我建议,我会非常感谢它的解释以及我试图习惯正则表达式。

1 个答案:

答案 0 :(得分:1)

这样的模式可以让您了解如何匹配您正在寻找的内容:

(?<1>\w+);(?<2>.*\w{2}-\w{2}):(?<3>\w+);(?<4>\w+);(?<5>\w+);(?<6>\w+.)

示例:http://regex101.com/r/jX6lA3

在vb.net中,您可能会将其编码为类似于:

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<1>\w+);(?<2>.*\w{2}-\w{2}):(?<3>\w+);(?<4>\w+);(?<5>\w+);(?<6>\w+.)" 
      Dim input As String = vCard.String 
      Dim matches As MatchCollection = Regex.Matches(input, pattern)

      For Each match As Match In matches
         Console.WriteLine("1: ", match.Groups["1"]).Value)
         Console.WriteLine("2: ", match.Groups["2"]).Value)
         Console.WriteLine("3: ", match.Groups["3"]).Value)
         Console.WriteLine("4: ", match.Groups["4"]).Value)
         Console.WriteLine("5: ", match.Groups["5"]).Value)
         Console.WriteLine("6: ", match.Groups["6"]).Value)
         Console.WriteLine()
      Next
      Console.WriteLine()
   End Sub 
End Module 

如果您当然想要使用正则表达式模式,那么您已经拥有了它可以非常简单地调整代码来安排您想要的组。因此,例如,可以随时随地调用和排列前缀(&#34; strPrefix&#34;在您的模式中):

Console.WriteLine("Prefix: ", match.Groups["strPrefix"].Value)