操作从PDF中提取的混乱csv数据的最佳方法是什么?

时间:2017-03-15 19:59:31

标签: python-3.x csv pdf tabula

我将此csv文件加载到列表中。如

dataFromFile = list(csv.reader(f))

每个值都应该是单个列表元素,以后我可以轻松操作。但正如您所看到的,某些值已合并为1.与3类似。有些人完全错过了那里的地方,比如2。 现在我通过硬编码,移动,合并,splitinf删除每行进行所有操作。我想知道是否有一个漂亮而方便的工具来帮助我完成这项工作? 所有数据都来自tabula-py。其余页面看起来很方便,但这个页面已经坏了。

All
0 ['', 'Mar 2017', 'Dec 2016', 'Sept 2016', 'Jun 2016', 'Mar 2016']
1 ['Earnings growth*', '6.9%', '5.6% (2.1%)', '7.2%', '11.1%', '-0.3%']
2 ['', '12.7%', '', '', '']
3 ['Capital spending', '4.1% Median = 7.5% 3.2% (6.8%)', '5.5%', '6.1%', '']
4 ['Advertising and', '', '', '', '']
5 ['marketing spending', '', '', '', '']
6 ['', '2.0% 1.9% (1.7%)', '2.2%', '6.1%', '4.8%']
7 ['Technology spending', '5.9% 5.3% (6.7%)', '6.8%', '8.2%', '5.6%']
8 ['R&D spending', '3.9% 4.4% (5.1%)', '3.9%', '7.4%', '4.5%']
9 ['Employment û full-time', '5.1% 6.0% (2.7%)', '1.0%', '0.5%', '-0.2%']
10 ['Wages and Salaries', '5.3% 6.8% (6.7%)', '3.8%', '6.3%', '4.2%']
11 ['Inflation (Chg in prices of 0.1% own-firm products)', '3.4% 2.6% 1.2% (1.3%)', '-1.2% Median=0%', 'Median=2%', '']
12 ['Health Care Costs', '4.3% 2.7% (3.9%)', '2.0%', '4.1%', '4.0%']
13 ['Revenue', '3.7% 6.3% (7.3%)', '4.2%', '5.4%', '0.4%']
14 ['', 'Compared to last qtr.', 'Compared to last qtr.', 'Compared to last qtr.', 'Compared to last qtr.', 'Compared to last qtr.']
15 ['Optimism about', 'More opt: 33.9%', 'More opt: 29.1%', 'More opt: 32.5%', 'More opt: 36.2%', 'More opt: 32.5%']
16 ['the countryÆs', 'Less opt: 35.7%', 'Less opt: 37.6%', 'Less opt: 35.3%', 'Less opt: 44.3%', 'Less opt: 47.3%']
17 ['economy', 'No chg: 30.4%', 'No chg: 33.3%', 'No chg: 32.2%', 'No chg: 19.5%', 'No chg: 20.1%']
18 ['Country optimism', '', '', '', '', '']
19 ['level', '57.6', '58.6', '65.0', '56.8', '56.9']
20 ['Optimism about', 'More opt: 40.7%', 'More opt: 35.6%', 'More opt: 36.7%', 'More opt: 39.7%', 'More opt: 41.3%']
21 ['own company', 'Less opt: 25.8% No chg: 33.6%', 'Less opt: 35.3% No chg: 29.2%', 'Less opt: 30.4% No chg: 32.9%', 'Less opt: 33.2% No chg: 27.2%', 'Less opt: 37.1% No chg: 21.6%']
22 ['Own company', '', '', '', '', '']
23 ['optimism level', '59.9', '60.8', '66.7', '61.6', '60.7']

处理后我希望使用以下格式的csv文件并使用suvh info。

enter image description here

0 个答案:

没有答案