我试图使用第一个indexof和substring。
在我下载的html文件中,我有这部分文字:
var arrayImageTimes = [];
arrayImageTimes.push('201702130145');arrayImageTimes.push('201702130200');arrayImageTimes.push('201702130215');arrayImageTimes.push('201702130230');arrayImageTimes.push('201702130245');arrayImageTimes.push('201702130300');arrayImageTimes.push('201702130315');arrayImageTimes.push('201702130330');arrayImageTimes.push('201702130345');arrayImageTimes.push('201702130400');
我想提取到一个List或数组只有最后的数字,我将有一个字符串列表:
201702130145
201702130200
201702130215
每个'之间的所有数字。 '
我试过了:
public void ExtractDateAndTimes(string f)
{
string startTag = "var arrayImageTimes = [];";
string endTag = "</script>";
int startTagWidth = startTag.Length;
int endTagWidth = endTag.Length;
int index = 0;
while (true)
{
index = f.IndexOf(startTag, index);
if (index == -1)
{
break;
}
// else more to do - index now is positioned at first character of startTag
int start = index + startTagWidth;
index = f.IndexOf(endTag, start + 1);
if (index == -1)
{
break;
}
// found the endTag
string g = f.Substring(start, index - start);
}
}
在构造函数中:
string text = File.ReadAllText(@"c:\Temp\testinghtml.html");
ExtractDateAndTimes(text);
但我得到的只是上面添加的var arrayImageTimes的文本块。
答案 0 :(得分:1)
使用Regex
使用Named matched subexpression
// Don't forget to escape full stops!
// Capture quoted values inside round braces into imageTime capturing group
Regex regex = new Regex(@"arrayImageTimes\.push\('(?<imageTime>\d+)'\)", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.Singleline);
MatchCollection matches = regex.Matches(myString);
List<string> timestamps = new List<string>();
foreach (Match m in matches)
{
timestamps.Add(m.Groups["imageTime"].Value);
}