正则表达式C#,HTML解析

时间:2018-10-21 13:42:34

标签: c# .net regex

请帮助。

我有一个html文本,我需要解析它。 文字:

  

converter.rates =   {“ 3”:{“ USD”:{“买入”:27.950001,“卖出”:28.190001},“ EUR”:{“买入”:32.049999,“卖出”:32.689999}},“ 8”:{“ RUB “:{”买入“:0.27,”卖出“:0.43},” USD“:{”买入“:27.799999,”卖出“:28.200001},” EUR“:{”买入“:31.700001,”卖出“:32.549999 }},“ 41”:{“ USD”:{“买入”:28.0,“卖出”:28.200001},“ EUR”:{“买入”:31.950001,“卖出”:32.650002}},“ 46”:{ “ RUB”:{“买入”:0.413,“卖出”:0.443},“ USD”:{“买入”:28.0,“卖出”:28.25},“ EUR”:{“买入”:31.73,“卖出” :32.73}},“ 47”:{“ RUB”:{“买进”:0.41,“卖出”:0.448},“ USD”:{“买入”:27.98,“卖出”:28.15},“ EUR”: {“ buy”:31.889999,“ sell”:32.540001}},“ 48”:{“ RUB”:{“ buy”:0.4,“ sell”:0.43},“ USD”:{“ buy”:28.0,“卖“:28.200001},”欧元“:{”买“:32.099998,”卖“:32.490002}},” 52“:{” RUB“:{”买“:0.41,”卖“:0.43},”美元“:{” buy“:27.950001,” sell“:28.25},” EUR“:{” buy“:32.0,” sell“:32.5}},” 77“:{” RUB“:{” buy“:0.38 ,“卖出”:0.43},“ USD”:{“买入”:28.049999,“卖出”:28.200001},“ EUR”:{“买入”:32.049999,“卖出”:32.5}},“ 79”:{ “ RUB”:{“买入”:0.412,“卖出”:0.444},“ USD”:{“买入”:27.950001,“卖出”:28.799999},“ EUR”:{“买入”:31.959999,“卖出” :33.099998}},“ 80”:{“ RUB”:{“购买”:0.38,“出售”:0。 43},“ USD”:{“ buy”:28.030001,“ sell”:28.190001},“ EUR”:{“ buy”:32.0,“ sell”:32.450001}},“ 70”:{“ RUB”:{ “ buy”:0.39,“ sell”:0.42},“ USD”:{“ buy”:28.0,“ sell”:28.25},“ EUR”:{“ buy”:32.0,“ sell”:32.200001}}, “ 1”:{“ RUB”:{“买入”:0.42658,“卖出”:0.42658},“ USD”:{“买入”:28.036648,“卖出”:28.036648},“ EUR”:{“买入”: 32.256161,“卖出”:32.256161}},“ 4”:{“ RUB”:{“买入”:0.42,“卖出”:0.43},“ USD”:{“买入”:27.950001,“卖出”:28.25} ,“ EUR”:{“买入”:32.150002,“卖出”:32.599998}},“ 10”:{“ RUB”:{“买入”:0.414,“卖出”:0.435},“ USD”:{“买入“:28.0,”卖出“:28.200001},” EUR“:{”买入“:32.0,”卖出“:32.599998}},” 13“:{” RUB“:{”买入“:0.275,”卖出“: 0.46},“ USD”:{“买入”:27.9,“卖出”:28.200001},“ EUR”:{“买入”:31.67,“卖出”:32.599998}},“ 15”:{“ RUB”:{ “ buy”:0.3749,“ sell”:0.4395},“ USD”:{“ buy”:27.985001,“ sell”:28.2075},“ EUR”:{“ buy”:32.036366,“ sell”:32.529091}}, “ 31”:{“ RUB”:{“买入”:0.275,“卖出”:0.42},“ USD”:{“买入”:27.9,“卖出”:28.139999},“ EUR”:{“买入”: 31.799999,“ sell”:32.400002}},“ 32”:{“ RUB”:{“ buy”:0.42,“ sell”:0.5},“ USD”:{“ buy”:28.07,“ sell”:28.299999} ,“ EUR”:{“购买”:32.150002,“出售”:32.5999 98}},“ 39”:{“ USD”:{“ buy”:28.07,“ sell”:28.25},“ EUR”:{“ buy”:32.150002,“ sell”:32.549999}},“ 40”: {“ RUB”:{“ buy”:0.41,“ sell”:0.43},“ USD”:{“ buy”:27.950001,“ sell”:28.139999},“ EUR”:{“ buy”:32.049999,“ sell “:32.400002}},” 64“:{” RUB“:{”买入“:0.4,”卖出“:0.425},” USD“:{”买入“:27.9,”卖出“:28.200001},” EUR“ :{“ buy”:32.099998,“ sell”:32.599998}},“ 73”:{“ RUB”:{“ buy”:0.4,“ sell”:0.43},“ USD”:{“ buy”:28.0, “ sell”:28.299999},“ EUR”:{“ buy”:32.0,“ sell”:32.549999}},“ 74”:{“ RUB”:{“ buy”:0.41,“ sell”:0.435},“ USD“:{” buy“:28.049999,” sell“:28.25},” EUR“:{” buy“:31.799999,” sell“:32.5}},” 85“:{” RUB“:{” buy“: 0.3,“卖出”:0.43},“ USD”:{“买入”:28.0,“卖出”:28.200001},“ EUR”:{“买入”:32.099998,“卖出”:32.52}},“ 86”: {“ RUB”:{“买入”:0.37,“卖出”:0.42},“ USD”:{“买入”:28.0,“卖出”:28.200001},“ EUR”:{“买入”:32.0,“卖出“:32.799999}},” 88“:{” RUB“:{”买入“:0.35,”卖出“:0.5},” USD“:{”买入“:28.0,”卖出“:28.15},” EUR“ :{“ buy”:32.099998,“ sell”:32.450001}},“ 90”:{“ RUB”:{“ buy”:4.0,“ sell”:4.4},“ USD”:{“ buy”:28.0, “ sell”:28.15},“ EUR”:{“ buy”:31.950001,“ sell”:32.450001}}}

我需要它的下一个信息:

  

银行代码-“ 3”   和美元汇率-27.950001,28.190001

我的表情:

  

@“(\ d +)”:.. USD .... \ w + ..(\ d +。\ d +)........(\ d +。\ d +)“

但是它没有用,因为美元并不一定总是在银行代码之后排在第一位

2 个答案:

答案 0 :(得分:2)

这是一个JSON文档。 JSON是一种递归格式,众所周知,解析递归数据时很难使用正则表达式。

请使用指定的解析器,例如NewtonSoft JSON

var rawData = @"converter.rates = { ... }"; // original string
var rawJson = rawData.Substring("converter.rates = ".Length); // remove the prefix
var json = JObject.Parse(rawJson); // convert to a JSON data structure

然后您可以像字典一样使用它:

foreach(var codeEntry in json)
{
    foreach(var currencyEntry in codeEntry.Value)
    {
        var code = codeEntry.Key;
        var currency = currencyEntry.Key;
        var buy = currencyEntry.Value["buy"].Value<double>();
        var sell = currencyEntry.Value["buy"].Value<double>();
        Console.WriteLine($"code of bank - {code} and {currency} rate - {buy}, {sell} ");
    }
}

答案 1 :(得分:0)

如果您仍然想使用正则表达式,可以这样做:

@"""(?<code>\d+)"":\{.*?(?<=""USD""):\{""buy"":(?<buy>\d+\.\d+),""sell"":(?<sell>\d+.\d+)\}"

它是根据您的示例构建的。基本上,它将创建三个命名的组'code''buy''sell'。除此之外,它匹配文字字符,仅使用'(?<=""USD"")'后面的外观来查找'USD',以获取所需汇率。

Edit

如果您有一个html文档,并且想将'converter.rates'var转换为文本,则可以使用此正则表达式:

@"converter.rates\s?=.*\}\}\}"

它将寻找字符串末尾的3个'}'。