Word文档示例
A 1. Name of House: Aasleagh Lodge
Townland: Srahatloe
Near: Killary Harbour, Leenane
Status/Public Access: maintained, private fishing lodge
Date Built: 1838-1850, burnt 1923, rebuilt 1928
Description: Large Victorian country house. Original house 6-bay, 2-storey, 3-bay section on right is higher; after fire house was reduced in size giving current three parallel- hipped roof bays.
Associated Families: Lord Sligo; rented - Hon David Plunkett ; Capt W.E. and Constance Mary Phillips; James Leslie Wanklyn M.P. for Bradford; Walter H. Maudslay; Ernest Richard Hartley; Alice Marsh, Lord and Lady Brabourne; Western Fisheries Board; Inland Fisheries Ireland.
有没有办法插入标题后面的数据,例如在word文档中存在“Townland”的地方我希望将其后面的数据插入到数据库中的列中,在本例中为“Srahatloe”。我想从Word文档中提取所有这些数据,它是我正在构建的网站,所有信息都存储在Word文档中,但我需要将文本添加到数据库而不复制和粘贴,因为文档非常大(70,000+个单词)是否有可用于执行此操作的脚本?
源代码
var wordApp = new Microsoft.Office.Interop.Word.Application();
var wordDoc = wordApp.Documents.Open(@"C:\Users\mhoban\Documents\Book.docx");
var txt = wordDoc.Content.Text;
var regex = new Regex(@"(Townland\: )(.+?)[\r\n]");
var allMatches = regex.Matches(txt);
foreach (Match match in allMatches)
{
var townValue = match.Groups[2].Value;
// Insert values into database
SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["ConnectionString"].ToString());
SqlCommand com = new SqlCommand();
com.CommandText = "INSERT INTO Houses (Townland) VALUES (@town)";
com.Parameters.Add("@town", SqlDbType.NVarChar).SqlValue = townValue;
com.Connection = con;
con.Open();
com.ExecuteNonQuery();
con.Close();
}
答案 0 :(得分:0)
为RegEx尖叫。这样的事情会让你工作:
var wordApp = new Microsoft.Office.Interop.Word.Application();
var wordDoc = wordApp.Documents.Open(pathToYourDocument);
var txt = wordDoc.Content.Text;
var regex = new Regex(@"(Townland\: )(.+?)[\r\n]");
var allMatches = regex.Matches(txt);
foreach (Match match in allMatches)
{
var townValue = match.Groups[2].Value;
//townValue now holds "Srahatloe"
//do your magic
}
答案 1 :(得分:0)
以下是我用于从word文档中提取特定文本的代码。
我最终使用正则表达式,速度要快得多,但我不再拥有代码了。无论如何这里是如何从word中提取文本并将其放在csv中。
请不要在开发PC上安装PIA以进行Office自动化。
要添加对Microsoft.Office.Interop.Word的引用,请转到Visual Studio - >右键点击参考 - > COM - > Micrososft.Word 14.0(抱歉我无法访问我的工作PC,因此无法附上截图)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.Excel;
using System.IO;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string month = "July2014";
string delimiter = ",";
string[] files = Directory.GetFiles("C:\\temp\\"+ month);
string[][] csvoutput = new string[][] { };
csvoutput = new string[][] { new string[]{"School Name","Student Name","Id","ReportDate"}};
StringBuilder sb = new StringBuilder();
sb.AppendLine(string.Join(delimiter, csvoutput[0]));
File.AppendAllText("C:\\Temp\\"+month+".csv", sb.ToString());
foreach (var file in files)
{
var id = string.Empty;
var studentName = string.Empty;
var school = string.Empty;
var reportDate = string.Empty;
if (file.ToLower().EndsWith(".doc"))
{
var word = new Microsoft.Office.Interop.Word.Application();
var sourceFile = new FileInfo(file);
var doc = word.Documents.Open(sourceFile.FullName);
Console.WriteLine("Processing :-{ " + file.ToLower());
for (int i = 0; i < doc.Paragraphs.Count; i++)
{
try
{
if (doc.Paragraphs[i + 1].Range.Text.StartsWith("School:"))
{
school = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a","").Replace("School: ","").Trim();
}
if (doc.Paragraphs[i + 1].Range.Text.StartsWith("Student Names:"))
{
studentName = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a", "").Replace("Student Names:","").Trim();
}
if (doc.Paragraphs[i + 1].Range.Text.StartsWith("xx Id:"))
{
id = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a", "").Replace("xx Id:", "").Trim();
}
if (doc.Paragraphs[i + 1].Range.Text.StartsWith("Date of Report:"))
{
reportDate = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a", "").Replace("Date of Report:","").Trim();
}
}
catch (Exception)
{
Console.WriteLine("Error occurred" + file.ToLower());
}
}
csvoutput = new string[][]
{
new string[]{school,studentName,id,reportDate}
};
int csvlength = csvoutput.GetLength(0);
for (int index = 0; index < csvlength; index++)
sb.AppendLine(string.Join(delimiter, csvoutput[index]));
File.AppendAllText("C:\\Temp\\" + month + ".csv", sb.ToString());
word.ActiveDocument.Close();
word.Quit();
}
}
Console.WriteLine("Finished");
Console.ReadLine();
}
}
}