我正在开发一个从游戏页面中提取内容的应用<(strong> example ),在文本框中将其显示给用户,如果是用户希望这样做,他/她可以将其保存为.txt文件或.xsl(excel电子表格格式)。
但我现在面临的主要问题是您必须手动更改代码“提取”有关他人的数据游戏中的单位。
如果您打开链接,您会看到我正在提取“武器”,“已使用”,来自Defender方面的“幸存”和“伤亡人数”(现在),但只有一种类型的单位(更像是该表的一行)被“提取” ,我正在寻找一种搜索“tr [1] / td [2] / span [1]”到“tr [45] / td [2] / span [1]”的方法(即使示例页面只有tr [16]),或者可能是一种自动搜索的方法,直到它找不到任何数据(没有),然后就会停止。
对于任何文字错误,我很抱歉,我不是母语人士
private void btnStart_Click(object sender, RoutedEventArgs e)
{
HtmlDocument brPage = new HtmlWeb().Load("http://us.desert-operations.com/world2/battleReport.php?code=f8d77b1328c8ce09ec398a78505fc465");
HtmlNodeCollection nodes = brPage.DocumentNode.SelectNodes("/html[1]/body[1]/div[1]/div[1]/div[3]/div[1]/div[1]/div[1]/div[2]/table[2]");
string result = "";
List<brContentSaver> ContentList = new List<brContentSaver>();
foreach (var item in nodes)
{
brContentSaver cL = new brContentSaver();
/* Here comes the junk handler, replaces all junk for nothing, essentially deleting it
I wish I knew a way to do this efficiently */
cL.Weapons = item.SelectSingleNode("tr[16]/td[1]").InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
cL.Used = item.SelectSingleNode("tr[16]/td[2]/span[1]").InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
cL.Survived = item.SelectSingleNode("tr[16]/td[3]").InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
if (cL.Survived == "0")
{
cL.Casualties = cL.Used;
} else
{
/* int Casualties = int.Parse(cL.Casualties);
* int Used = int.Parse(cL.Used);
* int Survived = int.Parse(cL.Survived);
* Casualties = Used - Survived; */
cL.Casualties = item.SelectSingleNode("tr[16]/td[4]").InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
}
ContentList.Add(cL);
}
foreach (var item in ContentList)
{
result += item.Weapons + " " + item.Used + " " + item.Survived + " " + item.Casualties + Environment.NewLine;
}
brContent.Text = result;
}
很抱歉,如果这听起来很愚蠢,但我是编程新手,特别是在C#中。
编辑1:我注意到“如果(cL.Survived ==”0“)”,我之前只是测试了一些东西,我忘了改变它,但是嘿,它有效
编辑2:如果您想知道我也在使用它:
public class brContentSaver
{
public string Weapons
{
get;
set;
}
public string Used
{
get;
set;
}
public string Survived
{
get;
set;
}
public string Casualties
{
get;
set;
}
}
答案 0 :(得分:0)
我没有太多时间写这篇文章,但希望如果你还需要它会有所帮助。我发现Linq更方便:
private static void Run()
{
HtmlDocument brPage = new HtmlWeb().Load("http://us.desert-operations.com/world2/battleReport.php?code=f8d77b1328c8ce09ec398a78505fc465");
var nodes = brPage.DocumentNode.Descendants("table").Where(_ => _.Attributes["class"] != null && _.Attributes["class"].Value != null && _.Attributes["class"].Value.Contains("battleReport"));
string result = "";
List<brContentSaver> ContentList = new List<brContentSaver>();
foreach (var item in nodes)
{
if (item.Descendants("th").Any(_ => _.InnerText.Equals("Weapons")))
{
//get all tr nodes except first one (header)
var trNodes = item.Descendants("tr").Skip(1);
foreach (var node in trNodes)
{
brContentSaver cL = new brContentSaver();
var tds = node.Descendants("td").ToArray();
/* Here comes the junk handler, replaces all junk for nothing, essentially deleting it
I wish I knew a way to do this efficiently */
cL.Weapons = tds[0].InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
cL.Used = tds[1].Descendants("span").FirstOrDefault()?.InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
if (string.IsNullOrEmpty(cL.Used))
{
cL.Used = tds[1].InnerText;
}
cL.Survived = tds[2].Descendants("span").FirstOrDefault()?.InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
if (string.IsNullOrEmpty(cL.Survived))
{
cL.Casualties = cL.Used;
}
else
{
/* int Casualties = int.Parse(cL.Casualties);
* int Used = int.Parse(cL.Used);
* int Survived = int.Parse(cL.Survived);
* Casualties = Used - Survived; */
cL.Casualties = tds[3].Descendants("span").FirstOrDefault()?.InnerText
.Replace(" * ", " ")
.Replace("  ; *  ;", " ");
if (string.IsNullOrEmpty(cL.Casualties))
{
cL.Casualties = tds[3].InnerText;
}
}
ContentList.Add(cL);
}
}
}
foreach (var item in ContentList)
{
result += item.Weapons + " " + item.Used + " " + item.Survived + " " + item.Casualties + Environment.NewLine;
}
var text = result;
}