稍微复杂的Web文本到变量解析

时间:2018-07-25 20:22:08

标签: c# parsing web

我正在从网站上获取文本并将其解析为变量。但是,我提取文本时得到的字符串有点复杂。在网络上看起来像这样...

Invoice #: 1267
Date: 4/16/2018 10:44:00 AM
PO #:
Reference:
Countermen: A/A

我遇到的问题是所有这些都是一个字符串。字符串也会动态更改,因为某些命令输入了文本,而其他命令则没有。例如有些订单的每个字段都已填写,而另一些订单的字段几乎都未填写。

Invoice #:
1267

<br>

Date:
4/16/2018 10:44:00 AM

<br>

PO #:

<br>

Reference:

<br>

Countermen:
A/A

这是我检查Web元素时显示的内容。

我想将信息解析为单独的字符串和整数以进行测试,并且由于某些字符串较长而有些较短,因此我很难处理字符串的整个“动态”部分。

如果有帮助,请在此处查看实际网站的一些图像:

Web html code

What the website displays

2 个答案:

答案 0 :(得分:2)

假设:

  1. 数据键和值由:
  2. 分隔
  3. 每个数据点之间用<br>隔开

给出示例数据:

using System;
using System.Collections.Specialized;


public class Program
{
    public static void Main()
    {
        var str = @"Invoice #:
                    1267

                    <br>

                    Date:
                    4/16/2018 10:44:00 AM

                    <br>

                    PO #:

                    <br>

                    Reference:

                    <br>

                    Countermen:
                    A/A";

        //Array containing "raw string data"
        var raw = str.Split(new[]{"<br>"}, StringSplitOptions.RemoveEmptyEntries);

        //Just using a simple NVC, opt for something else based on your needs       
        var kvp = new NameValueCollection();

        //Go through the raw array we created earlier and
        // add the key/value pairs to our NameValueCollection, kvp
        Array.ForEach(raw, s =>
        {
            //Because of date/time, we'll restrict colon to first occurrence
            var data = s.Split(new [] {":"}, 2, StringSplitOptions.None);
            kvp.Add(data[0].Trim(), data[1].Trim());
        });


        /*
         * At this point, we have our "parsed" data in
         * key/value pairs, kvp and can use it as needed
         *
         */

        // We can loop through the kvp and simply display
        foreach(string k in kvp.Keys){
            Console.WriteLine("{0} = {1}", k, kvp[k]);
        }


        // We can assign values to variables we create
        var invNum = kvp["Invoice #"];
    }
}

输出:

Invoice # = 1267
Date = 4/16/2018 10:44:00 AM
PO # = 
Reference = 
Countermen = A/A

NameValueCollection Class

的文档

Hth ...

答案 1 :(得分:0)

您可以使用简单的正则表达式。 var twoDataSources = ((DataTable)dataGridView1.DataSource).Select() .Concat(((DataTable)dataGridView2.DataSource).Select()); var twoDataSources = ((DataTable)dataGridView1.DataSource).Select() .Union(((DataTable)dataGridView2.DataSource).Select()); 匹配任何空白,而\s*匹配在空白之间找到的任何内容。最后的(.*?)强制其匹配$之后的所有文本,这很重要:

Countermen

Dotnetfiddle此处:https://dotnetfiddle.net/VHF4uW