我的PDF内容如下:
第一页:
Date Item IN OUT
17-Oct Electrical Fan - 38
with RF895 cable
model XO-8745
56148
17-Oct Electrical Iron 77 -
with ring
model X12358
78418
newline
:
:
:
17-Oct Electrical Fan 77 -
Note: This receipt is computer generated and no signature is required
第二页:
Date Item IN OUT
with RF895 cable
model XO-8745
56148
17-Oct Electrical Iron - 100
with ring
model 54789
XP-859
newline
:
:
:
17-Oct Electrical Iron 17 -
with ring
Note: This receipt is computer generated and no signature is required
第三页:
Date Item IN OUT
model X12358
56148
17-Oct Electrical Fan - 38
with RF895 cable
model XO-8745
56148
:
:
:
17-Oct Electrical Fan 108 -
with RF895 cable
model XO-8745
56148
Note: This receipt is computer generated and no signature is required
我使用Itextsharp将数据合并为1行并将其放入excel,因为第二行在下一页中,所以我无法获得我想要的行,因为它只能逐页读取PDF。 代码如下:
if (File.Exists(theFile.FullName))
{
Console.Write(++count + " " + theFile.FullName);
PdfReader pdfReader = new PdfReader(theFile.FullName);
try
{
DataTable finalTbl = GetTable();
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); //Convert to text from PDF
string[] theLines = currentText.Split(Environment.NewLine.ToCharArray());
using (StringReader reader = new StringReader(currentText))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] splittedTxt = line.Split(new[] { " " },
StringSplitOptions.RemoveEmptyEntries);
if (splittedTxt.Any())
{
// create a table
}
finalTbl.Rows.Add( //add desired datatable)
}
}
}
}
}
catch
{
throw;
}
finally
{
pdfReader.Close();
}
}
我得到的结果:
17-Oct Electrical Fan with RF895 cable model XO-8745 56148
17-Oct Electrical Iron with ring model X12358 78418 newline
17-Oct Electrical Fan
17-Oct Electrical Iron with ring model 54789 XP-859 newline
17-Oct Electrical Iron with ring
17-Oct Electrical Fan with RF895 cable model XO-8745 56148
17-Oct Electrical Iron with ring model X12358
17-Oct Electrical Fan with RF895 cable model XO-8745 56148
17-Oct Electrical Fan with RF895 cable model XO-8745 56148
在创建数据表之前,有什么方法首先读取和合并所有页面吗?