我有一个CSV文件,在同一张纸上有两组数据。我做了我的研究,我能找到的最接近的是我所附的。我遇到的问题是它们都不是表格,它们是单独的数据集;两者都被许多行分开。我想将每个数据集保存为单独的CSV。这在Python中可行吗?请提供您的帮助。
Python CSV module: How can I account for multiple tables within the same file?
第一集:
Presented_By: Source: City:
Chris Realtor Knoxville
John Engineer Lantana
Wade Doctor Birmingham
第二集:
DriveBy 15
BillBoard 45
Social Media 85
我的来源是一个Excel文件,我将其转换为CSV文件。
import pandas as pd
data_xls = pd.read_excel('T:\DataDump\Matthews\REPORT 11.13.16.xlsm', 'InfoCenterTracker', index_col=None)
data_xls.to_csv('your_csv.csv', encoding='utf-8')
second_set = pd.read_csv('your_csv.csv',skiprows=[10,11,12,13,14,15,16,17,18,19,20,21,22,23,23])
答案 0 :(得分:1)
在pandas'skiprows
read_csv
$ cat d.dat
Presented_By: Source: City:
Chris Realtor Knoxville
John Engineer Lantana
Wade Doctor Birmingham
DriveBy 15
BillBoard 45
Social Media 85
In [1]: import pandas as pd
In [2]: pd.read_csv('d.dat',skiprows=[0,1,2,3])
Out[2]:
DriveBy 15
0 BillBoard 45
1 Social Media 85
In [3]: pd.read_csv('d.dat',skiprows=[4,5,6])
Out[3]:
Presented_By: Source: City:
0 Chris Realtor Knoxv...
1 John Engineer Lantana
2 Wade Doctor Birmi...
通过搜索csv有2个条目而不是3
时,您可以检测要跳过的行In [25]: for n, line in enumerate(open('d.dat','r').readlines()):
...: if len(line.split()) !=3:
...: breakpoint = n
...:
In [26]: pd.read_csv('d.dat',skiprows=range(breakpoint-1))
Out[26]:
DriveBy 15
0 BillBoard 45
1 Social Media 85
In [27]: pd.read_csv('d.dat',skiprows=range(breakpoint-1, n+1))
Out[27]:
Presented_By: Source: City:
0 Chris Realtor Knoxv...
1 John Engineer Lantana
2 Wade Doctor Birmi...