我有一个html文档,在解析后只包含格式化文本。我想知道是否有可能得到它的文本,如果我是鼠标选择它+复制+粘贴在新的文本文档?
我知道在Microsoft.Office.Interop中我可以使用.ActiveSelection属性来选择打开的Word的内容。
我需要找到一种方法来加载html(可能在浏览器对象中),然后复制其所有内容并将其分配给字符串。
var doc = new HtmlAgilityPack.HtmlDocument();
var documetText = File.ReadAllText(myhtmlfile.html, Encoding.GetEncoding(1251));
documetText = this.PerformSomeChangesOverDocument(documetText);
doc.LoadHtml(documetText);
var stringWriter = new StringWriter();
AgilityPackEntities.AgilityPack.ConvertTo(doc.DocumentNode, stringWriter);
stringWriter.Flush();
var titleNode = doc.DocumentNode.SelectNodes("//title");
if (titleNode != null)
{
var titleToBeRemoved = titleNode[0].InnerText;
document.DocumentContent = stringWriter.ToString().Replace(titleToBeRemoved, string.Empty);
}
else
{
document.DocumentContent = stringWriter.ToString();
}
然后我返回文档对象。问题是字符串并不总是格式化,因为我希望它是
答案 0 :(得分:0)
您应该可以使用StreamReader
,当您阅读每一行时,只需使用StreamWriter
这样的内容将会读取文件的结尾并将其保存到新文件中。如果你需要在文件中做额外的逻辑,我会插入一个注释,让你知道在哪里做这些。
private void button4_Click(object sender, EventArgs e)
{
System.IO.StreamWriter writer = new System.IO.StreamWriter("C:\\XXX\\XXX\\XXX\\test2.html");
String line;
using (System.IO.StreamReader reader = new System.IO.StreamReader("C:\\XXX\\XXX\\XXX\\test.html"))
{
//Do until the end
while ((line = reader.ReadLine()) != null) {
//You can insert extra logic here if you need to omit lines or change them
writer.WriteLine(line);
}
//All done, close the reader
reader.Close();
}
//Flush and close the writer
writer.Flush();
writer.Close();
}
您也可以将其保存为字符串,然后随意执行任何操作。您可以使用新行保持相同的格式。
编辑以下内容会考虑您的代码
private void button4_Click(object sender, EventArgs e)
{
String line;
String filetext = null;
int count = 0;
using (System.IO.StreamReader reader = new System.IO.StreamReader("C:\\XXXX\\XXXX\\XXXX\\test.html"))
{
while ((line = reader.ReadLine()) != null) {
if (count == 0) {
//No newline since its start
if (line.StartsWith("<")) {
//skip this it is formatted stuff
}
else {
filetext = filetext + line;
}
}
else {
if (line.StartsWith("<"))
{
//skip this it is formatted stuff
}
else
{
filetext = filetext + "\n" + line;
}
}
count++;
}
Trace.WriteLine(filetext);
reader.Close();
}
}