尝试在UWP中从c#中提取PDF文本

时间:2017-08-15 10:05:05

标签: c# pdf c#-4.0 uwp uwp-xaml

private void FvPDF_SelectionChanged(object sender, SelectionChangedEventArgs e)
{
    int index = fvPDF.SelectedIndex;
    if (ResultList.Count > 0 && ResultList.Count > index)
    {
        this.DisplayText.Text = ResultList[index];
    }
}

private async void openpdf_Click(object sender, RoutedEventArgs e)
{
    // this.openpdf.IsEnabled = false;
    FileOpenPicker picker = new FileOpenPicker();
    picker.FileTypeFilter.Add(".pdf");
    StorageFile pdfFile = await picker.PickSingleFileAsync();

    if (pdfFile != null)
    {
        // Load pdf from file.
        PdfDocument pdfDoc = await PdfDocument.LoadFromFileAsync(pdfFile);
        uint pageCount = pdfDoc.PageCount;
        progressbar.Maximum = pageCount;
        fvPDF.Items.Clear();
        ResultList.Clear();

        for (uint i = 0; i < pageCount; i++)
        {
            using (PdfPage page = pdfDoc.GetPage(i))
            {
                InMemoryRandomAccessStream stream = new InMemoryRandomAccessStream();

                // Default is actual size. Render pdf page to stream
                await page.RenderToStreamAsync(stream);

                // Create bitmapImage for Image source
                BitmapImage bitmap = new BitmapImage();

                // Set stream as bitmapImage's source
                await bitmap.SetSourceAsync(stream);

                // Create image as FlipView item's source
                Image img = new Image();
                img.Source = bitmap;

                // Add image item to flipview.
                fvPDF.Items.Add(img);

                // Update processbar
                progressbar.Value++;

                // New OcrEngine with default language
                OcrEngine ocrEngine = OcrEngine.TryCreateFromUserProfileLanguages();
                BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);
                SoftwareBitmap softwareBitmap = await decoder.GetSoftwareBitmapAsync(BitmapPixelFormat.Bgra8, BitmapAlphaMode.Premultiplied);

                // Get recognition result
                OcrResult result = await ocrEngine.RecognizeAsync(softwareBitmap);

                // Add to result list
                ResultList.Add(result.Text);
            }
        }

        // Show first page recognition result
        FvPDF_SelectionChanged(null, null);
    }
}

在第一次尝试阅读pdf文件时它工作正常,而在尝试阅读第二个pdf时,它显示错误并抛出异常:

  

System.ArgumentOutOfRangeException

在这一行:

this.DisplayText.Text = ResultList[index];

如何清除&#34;列表&#34;所以它第二次再次工作等等?

1 个答案:

答案 0 :(得分:1)

我测试了你的代码片段,它会抛出你提到的以下异常:

  

System.ArgumentOutOfRangeException:'索引超出范围。必须是非负数且小于集合的大小。'

实际上对于这个异常,因为mkl说它是由SelectedIndex的值为“-1”引起的,ResultList[index]抛出异常。您调用FvPDF_SelectionChanged(null, null);来显示第一页结果,此方法将获取SelectedIndex以供使用,但实际上同时SelectedIndex默认为“-1”,因为没有项目选择。详情请参阅SelectedIndex财产。

因此,如果要显示第一个识别页面结果,则应在加载pdf文件后将SelectedIndex设置为0。更新后的代码如下:

// Show first page recognition result
//FvPDF_SelectionChanged(null, null);
fvPDF.SelectedIndex = 0;