需要有关如何从.docx / .doc文件中提取数据然后进入SQL Server的建议

时间:2011-06-24 07:34:33

标签: c# interop ms-word openxml

我想为我的项目开发一个应用程序,它将加载过去一年的考试/练习论文(word文件),相应地检测部分,提取该部分中的问题和图像,然后存储问题和图像进入数据库。 (问题报告的预览位于本文的底部)

所以我需要一些关于如何从word文件中提取数据,然后将它们插入数据库的建议。目前我有一些方法可以这样做,但是当文件包含带有背景图像的文本框时,我不知道如何实现它们。问题必须与图像联系起来。

方法一(利用ms office互操作)

  • 加载word文件 - >提取图片, 保存到文件夹中 - >提取文字, 另存为.txt - >从.txt中提取文本,然后存储在db

问题:

  • 如何检测部分和问题?
  • 如何将图片链接到问题?

从word文件中提取文字(工作):

private object missing = Type.Missing;
private object sFilename = @"C:\temp\questionpaper.docx";
private object sFilename2 = @"C:\temp\temp.txt";
private object readOnly = true;
object fileFormat = Word.WdSaveFormat.wdFormatText;

private void button1_Click(object sender, EventArgs e)
{
   Word.Application wWordApp = new Word.Application();
   wWordApp.DisplayAlerts = Word.WdAlertLevel.wdAlertsNone;
   Word.Document dFile = wWordApp.Documents.Open(ref sFilename,
                            ref missing, ref readOnly, ref missing, ref missing,
                            ref missing, ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing, ref missing, 
                            ref missing, ref missing, ref missing);

   dFile.SaveAs(ref sFilename2, ref fileFormat, ref missing, ref missing, 
            ref missing, ref missing, ref missing, ref missing,ref missing,
            ref missing,ref missing,ref missing,ref missing,ref missing,
            ref missing,ref missing);
   dFile.Close(ref missing, ref missing, ref missing);
}

从word文件中提取图片(对文本框内的图片不起作用):

private Word.Application wWordApp;
private int m_i;
private object missing = Type.Missing;
private object filename = @"C:\temp\questionpaper.docx";
private object readOnly = true;

private void CopyFromClipbordInlineShape(String imageIndex)
{
   Word.InlineShape inlineShape = wWordApp.ActiveDocument.InlineShapes[m_i];
   inlineShape.Select();
   wWordApp.Selection.Copy();
   Computer computer = new Computer();
   if (computer.Clipboard.GetDataObject() != null)
   {
      System.Windows.Forms.IDataObject data = computer.Clipboard.GetDataObject();
      if (data.GetDataPresent(System.Windows.Forms.DataFormats.Bitmap))
      {
         Image image = (Image)data.GetData(System.Windows.Forms.DataFormats.Bitmap, true);
         image.Save("C:\\temp\\DoCremoveImage" + imageIndex + ".png", System.Drawing.Imaging.ImageFormat.Png);
      }
   }
}

private void button1_Click(object sender, EventArgs e)
{
    wWordApp = new Word.Application();
    wWordApp.Documents.Open(ref filename,
                                ref missing, ref readOnly, ref missing, ref missing,
                                ref missing, ref missing, ref missing, ref missing,
                                ref missing, ref missing, ref missing, ref missing, 
                                ref missing, ref missing, ref missing);
    try
    {
       for (int i = 1; i <= wWordApp.ActiveDocument.InlineShapes.Count; i++)
       {
          m_i = i;
          CopyFromClipbordInlineShape(Convert.ToString(i));
       }
    }
    finally
    {
       object save = false;
       wWordApp.Quit(ref save, ref missing, ref missing);
       wWordApp = null;
    }
 }

方法二

  • 解压缩word文件(.docx) - &gt;复制媒体(图像)文件夹,存储在某处 - &gt;解析XML文件 - &gt;将文本存储在db

任何建议/帮助将不胜感激:D

预览word文件: Preview of the word file (备份链接:http://i.stack.imgur.com/YF1Ap.png

1 个答案:

答案 0 :(得分:0)