我正在处理NCEI海洋数据,这些数据是没有标题的.dat文件,使用python(https://www.ncei.noaa.gov/data/marine/icoads3.0/作为文件) 他们看起来像:
166210151200 4962 35378 1306 101134 NL 1585 26 165 17796730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000003002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0493800N 102600E493700N 2 1TENERIFE 0 21662101512 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015WZW 7.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZZO MOU (?) KOELTE 00000000CLIWOC VERSION 1.0
166210161300 4907 35215 1306 101134 NL 1585 26 165 17797730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000013002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0490400N 84800E 1 1TENERIFE 0 21662101612 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZW 1/2 N 18.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZZO MOU KOELTE 00000000CLIWOC VERSION 1.0
166210171300 4812 35000 1306 101134 NL 1695 26 165 17680730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000023002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0483000N 63900E480700N 2 1TENERIFE 0 21662101712 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZWTW 15.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZTO MOU KOELTE MOOI WEER 00000000CLIWOC VERSION 1.0
166210181300 4758 34925 1306 101134 NL 1695 26 165 17670730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000033002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0474100N 55400E473500N 2 1TENERIFE 0 21662101812 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZWTW 11.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZTO MOU KOELTE 'ENN MOUT'? REGEN 01000000CLIWOC VERSION 1.0
166210191300 4757 34795 1306 101134 NL 1805 67 165 17672730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000043002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0473400N 43600E 1 1TENERIFE 0 21662101912 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015W/Z 14.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES Z MARSZEILSKOELTE, TOUPKOULTE REGEN 01000000CLIWOC VERSION 1.0
这些是我使用
导入的制表符分隔文件data = pd.read_table('file.dat', header=None)
使用包含所有数据的单个列将数据导入为x行。在单列中,每个数据由空格分隔。
是否有一种方法可以将此数据导入列或读取数据变量,并根据空白区域将每行拆分为多个列。我以为这就是我用read.table函数做的事情。完整的数据集很大,所以我更喜欢一种方法来导入它们而不必在之后处理它们。
答案 0 :(得分:1)
我认为你需要的是Fixed Width Formatted:
<强>代码:强>
df = pd.read_fwf('IMMA.dat', header=None)
print(df.dtypes)
<强>结果:强>
[17 rows x 66 columns]
0 int64
1 int64
2 int64
3 int64
...
61 object
62 object
63 object
64 object
65 float64
dtype: object
答案 1 :(得分:0)
你可以尝试:
pd.read_csv('test.dat', delim_whitespace=True, engine = 'python', names = range(66))
这里66是您可能需要调整的列数。