我使用OCR从图像转换了几个文本文件。一些文本文件包含多个表。这些文件的列数,分隔符和数据开始的行数不同。以下是示例2文件:
file1.txt :在单个文本文件中包含两个表
Receipt
Date: 12/05/2015 Page: 1
Status: Active
Location: Florida, USA
Prod ID Category ID Product Name Received Date Quantity Price
1 201 ABC 02/01/2015 5 200
2 02/01/2015 1 100
3 204 XYZ 05/02/2015 10 2000
Total 16 2300
Date: 01/02/2016 Page: 2
Status: Complete
Location: Florida, USA
Prod ID Category ID Product Name Received Date Quantity Price
1 202 ABC 02/01/2015 5 200
2 203 MNO 02/01/2015 1 100
3 204 XYZ 05/02/2015 10 2000
Total 16 2300
file2.txt :包含一个表,但格式与上面不同
Receipt Date: 12/05/2015 Page: 1 Location: California, USA Status: Complete
Prod ID Product Received Sent Quantity Price
Name Date Date
1 ABC 02/01/2015 03/01/2015 5 200
2 PQR 02/01/2015 03/01/2015 1 100
3 XYZ 05/02/2015 03/02/2015 10 2000
我希望阅读文件并为每个文件/表创建数据框。有没有办法应用机器学习/ NLP将这些文本文件转换为R中的数据框。