如何使用正则表达式获取数据

时间:2019-03-04 02:56:07

标签: c# regex

我有以下模式:

    private const string _usernamePattern = "Username: <strong>.*</strong>";

和代码:

    private string Grab(string text, string pattern)
    {
        Regex regex = new Regex(pattern);
        if (!regex.IsMatch(text))
            throw new Exception();
        else
            return regex.Match(text).Value;

    }

因此,它对于像这样的字符串也可以正常工作

Username: <strong>MyUsername</strong>

但是我只需要抓住MyUsername,而无需<strong>标签。 怎么做?

2 个答案:

答案 0 :(得分:2)

您实际上不应该使用正则表达式来执行此操作,而应该使用专用的html解析器。

看到这个关于为什么的问题

RegEx match open tags except XHTML self-contained tags

但是,如果这是一个非常有限的情况而不是html块,而您想要的只是两个标记之间的文本,则可以使用以下模式...

  • 肯定断言后的零宽度
  • 零宽度正向超前断言
Sub CombineTextFiles()
    Dim fso As Object
    Dim xlsheet As Worksheet
    Dim qt As QueryTable
    Dim txtfilesToOpen As Variant, txtfile As Variant

    Application.ScreenUpdating = False
    Set fso = CreateObject("Scripting.FileSystemObject")

    txtfilesToOpen = Application.GetOpenFilename _
                 (FileFilter:="Text Files (*.csv), *.csv", _
                  MultiSelect:=True, Title:="Text Files to Open")

    For Each txtfile In txtfilesToOpen
        ' FINDS EXISTING WORKSHEET
        For Each xlsheet In ThisWorkbook.Worksheets
            If xlsheet.Name = Replace(fso.GetFileName(txtfile), ".csv", "") Then
                xlsheet.Activate
                GoTo ImportCSV
            End If
        Next xlsheet

        ' CREATES NEW WORKSHEET IF NOT FOUND
        Set xlsheet = ThisWorkbook.Worksheets.Add( _
                             After:=ThisWorkbook.Sheets(ThisWorkbook.Sheets.Count))
        xlsheet.Name = Replace(fso.GetFileName(txtfile), ".csv", "")
        xlsheet.Activate
        GoTo ImportCSV

ImportCSV:
        ' DELETE EXISTING DATA
        ActiveSheet.Range("A:Z").EntireColumn.Delete xlShiftToLeft

        ' IMPORT DATA FROM TEXT FILE
        With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & txtfile, _
          Destination:=ActiveSheet.Cells(1, 1))
            .TextFileParseType = xlDelimited
            .TextFileConsecutiveDelimiter = False
            .TextFileTabDelimiter = False
            .TextFileSemicolonDelimiter = False
            .TextFileCommaDelimiter = False
            .TextFileSpaceDelimiter = False
            .TextFileOtherDelimiter = "|"

            .Refresh BackgroundQuery:=False
        End With

        For Each qt In ActiveSheet.QueryTables
            qt.Delete
        Next qt
    Next txtfile

    Application.ScreenUpdating = True
    MsgBox "Successfully imported text files!", vbInformation, "SUCCESSFUL IMPORT"

    Set fso = Nothing
End Sub

enter image description here

答案 1 :(得分:1)

尝试:

private const string _usernamePattern = "Username: <strong>(?<Email>.*)</strong>";
...
private string Grab(string text, string pattern)
{
    var match = Regex.Match(text, pattern);

    if (!match.Success)
        throw new Exception();
    else
        return match.Groups["Email"].Value;
}