我想从csv文件创建一个包含不同列但没有分隔符的数据框。看来列条目之间只有不同数量的空格。
此外,csv顶部有一些标题行,其中包含自述信息而根本没有任何列。
我无法使用pd.read_csv()
执行此操作谢谢!
该文件看起来像这样:
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
...
P-X1-6030-07-A01 368963
P-X1-6030-08-A01 368964
P-X1-6030-09-A01 368965
P-A-1-1011-14-G-01 368967
P-A-1-1014-01-G-05 368968
P-A-1-1017-02-D-01 368969
...
答案 0 :(得分:3)
假设您有以下数据文件:
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
P X1 6030-07-A01 368963
P-X1-6030-07-A01 368963
P-X1-6030-08-A01 368964
P-X1-6030-09-A01 368965
P-A-1-1011-14-G-01 368967
P-A-1-1014-01-G-05 368968
P-A-1-1017-02-D-01 368969
解决方案:让我们使用read_fwf()方法:
In [192]: fn = r'D:\temp\.data\data.fwf'
In [193]: pd.read_fwf(fn, widths=[19, 7], skiprows=4, header=None)
Out[193]:
0 1
0 P X1 6030-07-A01 368963 # NOTE: first column has spaces ...
1 P-X1-6030-07-A01 368963
2 P-X1-6030-08-A01 368964
3 P-X1-6030-09-A01 368965
4 P-A-1-1011-14-G-01 368967
5 P-A-1-1014-01-G-05 368968
6 P-A-1-1017-02-D-01 368969
答案 1 :(得分:0)
pd.read_csv(filename, delim_whitespace=True, skiprows = number of rows to skip)