解析HTML表格内部文本条带标点符号(逗号)

时间:2013-12-10 08:04:51

标签: c# html html-table punctuation innertext

我正在将HTML表解析为文本文件,下面是我的代码示例。在cols6或第6 <td></td>中,innertext是例如70,430。在将innertext写入文本文件时,我无法解决如何忽略逗号的问题。我希望它只能写70430而不是70,430。我可以知道我应该对cols6[j].InnerText做些什么才能摆脱数字中的,?任何帮助将非常感激。谢谢! :)

        // Load HTML
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.Load(fileName);
        // Get all tables in the document
        HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");

        using (FileStream fs = new FileStream(@"..\..\bin\Debug\Pages\" + "Director.txt", FileMode.Append))
        using (StreamWriter sw = new StreamWriter(fs))
        {
            // Iterate all rows in the relevant table
            HtmlNodeCollection rows = tables[2].SelectNodes(".//tr[position() >2]");
            for (int i = 0; i < rows.Count; ++i)
            {
                // Iterate all columns in this row
                HtmlNodeCollection cols = rows[i].SelectNodes(".//td[1]");
                HtmlNodeCollection cols2 = rows[i].SelectNodes(".//td[2]");
                HtmlNodeCollection cols3 = rows[i].SelectNodes(".//td[3]");
                HtmlNodeCollection cols4 = rows[i].SelectNodes(".//td[4]");
                HtmlNodeCollection cols5 = rows[i].SelectNodes(".//td[5]");
                HtmlNodeCollection cols6 = rows[i].SelectNodes(".//td[6]");
                HtmlNodeCollection cols7 = rows[i].SelectNodes(".//td[7]");
                for (int j = 0; j < cols.Count; ++j)
                    // Get the value of the column and print it
                    sw.WriteLine(cols[j].InnerText + "," + cols2[j].InnerText + "," + cols3[j].InnerText + "," +
                                 cols4[j].InnerText + "," + cols5[j].InnerText + "," + cols6[j].InnerText + "," + cols7[j].InnerText + ",822");
            }
            sw.Flush();
            sw.Close();
            fs.Close();
        }

1 个答案:

答案 0 :(得分:2)

您可以替换()逗号。

cols6[j].InnerText = cols6[j].InnerText.Replace(",", "");

对于WriteLine(),您也可以这样:

sw.WriteLine(cols[j].InnerText + "," + cols2[j].InnerText + "," + cols3[j].InnerText + "," +
                             cols4[j].InnerText + "," + cols5[j].InnerText + "," + cols6[j].InnerText.Replace(",", "") + "," + cols7[j].InnerText + ",822");