我想将pdf数据转换为excel数据。我已经将pdf转换为文本文件,并删除了.txt文件中不必要的文本,但它们现在已经存在行但我希望它们成为列式。
PDF文件:chemistry-chemists.com/chemister/Spravochniki/handbook-of-aqueous-solubility-data-2010.pdf
excel文件的当前状态:
excel文件的必需状态:
答案 0 :(得分:1)
PDFtables.com专门从PDF中将表格提取到Excel中。这应该能够做你想要的:)
答案 1 :(得分:0)
在ASP.NET中,您可以使用该代码
<div>
Upload PDF File :<asp:FileUpload ID="fuPdfUpload" runat="server" />
<asp:Button ID="btnExportToExcel" Text="Export To Excel" OnClick="ExportToExcel" runat="server" />
</div>
!!您必须从NuGet实现iTextSharp!
protected void ExportToExcel(object sender, EventArgs e)
{
if (this.fuPdfUpload.HasFile)
{
string file = Path.GetFullPath(fuPdfUpload.PostedFile.FileName);
this.ExportPDFToExcel(file);
}
}
private void ExportPDFToExcel(string fileName)
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
text.Append(currentText);
}
pdfReader.Close();
Response.Clear();
Response.Buffer = true;
Response.AddHeader("content-disposition", "attachment;filename=ReceiptExport.xls");
Response.Charset = "";
Response.ContentType = "application/vnd.ms-excel";
Response.Write(text);
Response.Flush();
Response.End();
}
答案 2 :(得分:0)
看看Tabula是一种非常有效的工具,可以将表格从pdf转换为https://github.com/tabulapdf/tabula