Question

我试图使用第一个indexof和substring。

在我下载的html文件中，我有这部分文字：

var arrayImageTimes = [];
arrayImageTimes.push('201702130145');arrayImageTimes.push('201702130200');arrayImageTimes.push('201702130215');arrayImageTimes.push('201702130230');arrayImageTimes.push('201702130245');arrayImageTimes.push('201702130300');arrayImageTimes.push('201702130315');arrayImageTimes.push('201702130330');arrayImageTimes.push('201702130345');arrayImageTimes.push('201702130400');

我想提取到一个List或数组只有最后的数字，我将有一个字符串列表：

201702130145
201702130200
201702130215

每个＆＃39;之间的所有数字。＆＃39;

我试过了：

public void ExtractDateAndTimes(string f)
        {
            string startTag = "var arrayImageTimes = [];";
            string endTag = "</script>";
            int startTagWidth = startTag.Length;
            int endTagWidth = endTag.Length;
            int index = 0;
            while (true)
            {
                index = f.IndexOf(startTag, index);
                if (index == -1)
                {
                    break;
                }
                // else more to do - index now is positioned at first character of startTag
                int start = index + startTagWidth;
                index = f.IndexOf(endTag, start + 1);
                if (index == -1)
                {
                    break;
                }
                // found the endTag
                string g = f.Substring(start, index - start);
            }
        }

在构造函数中：

string text = File.ReadAllText(@"c:\Temp\testinghtml.html");
ExtractDateAndTimes(text);

但我得到的只是上面添加的var arrayImageTimes的文本块。

Answer 1

使用Regex使用Named matched subexpression

查找指定捕获组的所有匹配项

// Don't forget to escape full stops!
// Capture quoted values inside round braces into imageTime capturing group
Regex regex = new Regex(@"arrayImageTimes\.push\('(?<imageTime>\d+)'\)", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.Singleline);

MatchCollection matches = regex.Matches(myString);

List<string> timestamps = new List<string>();

foreach (Match m in matches)
{
    timestamps.Add(m.Groups["imageTime"].Value);
}

如何使用indexof和substring或者HtmlAgilityPack从文本部分获取数字？

1 个答案: