我想从文件中读取时间字符串和数据,但是当我使用loadtxt
时,我无法同时读取字符串和数字,因为字符串不是浮点数。所以我尝试使用genfromtxt
并使用delimiter=[]+[]+[]
来记录我所拥有的列,但字符串的读取方式与nan
类似。我想像时间数组(date2num,datetime或类似)一样直接读取时间,以便能够以正确的形式在matplotlib中绘图。那么,我该怎么办?我在下面留下一个mi列表(显然,这是更多的数据):
GOES data for time interval: 20-Feb-2014 00:00:00.000 to 27-Feb-2014 00:00:00.000
Current time: 23-Mar-2014 21:52:00.00
Time at center of bin 1.0 - 8.0 A 0.5 - 4.0 A Emission Meas Temp
watts m^-2 watts m^-2 10^49 cm^-3 MK
20-Feb-2014 00:00:00.959 4.3439e-006 3.9946e-007 0.30841 10.793
20-Feb-2014 00:00:02.959 4.3361e-006 3.9835e-007 0.30801 10.789
20-Feb-2014 00:00:04.959 4.3413e-006 3.9501e-007 0.30994 10.743
20-Feb-2014 00:00:06.959 4.3361e-006 3.9389e-007 0.30983 10.735
20-Feb-2014 00:00:08.959 4.3361e-006 3.9278e-007 0.31029 10.722
20-Feb-2014 00:00:10.959 4.3387e-006 3.9278e-007 0.31058 10.719
20-Feb-2014 00:00:12.959 4.3361e-006 3.9278e-007 0.31029 10.722
20-Feb-2014 00:00:14.959 4.3361e-006 3.9055e-007 0.31122 10.695
20-Feb-2014 00:00:16.959 4.3334e-006 3.8721e-007 0.31234 10.657
根据建议,我使用以下方式阅读数据:
pd.read_csv('/filename',sep='\s\s+',header=5,
names=['time','band1','band2','emeas','temp'])
我读了数据,但只是一个问题,当我打印数据时出现:
time band1 band2 emeas temp
0 20-Feb-2014 00:00:03.005 0.000004 0 0.31000 10.866
1 20-Feb-2014 00:00:05.052 0.000004 0 0.31199 10.819
2 20-Feb-2014 00:00:07.102 0.000004 0 0.31190 10.811
3 20-Feb-2014 00:00:09.149 0.000004 0 0.31237 10.798
4 20-Feb-2014 00:00:11.199 0.000004 0 0.31266 10.795
5 20-Feb-2014 00:00:13.245 0.000004 0 0.31237 10.798
6 20-Feb-2014 00:00:15.292 0.000004 0 0.31334 10.770
7 20-Feb-2014 00:00:17.342 0.000004 0 0.31451 10.732
8 20-Feb-2014 00:00:19.389 0.000004 0 0.31451 10.732
9 20-Feb-2014 00:00:21.439 0.000004 0 0.31421 10.735
所以,显然band1和band2的数据已经四舍五入。实际上,当绘图时它似乎是正确的(非圆形),但为什么在框架中看起来像那样。
答案 0 :(得分:1)
使用正则表达式可能有更优雅的解决方案,但这也有效。
from datetime import datetime
input_file = open("path/filename")
for line in input_file:
line_parts = line.split()
if len(line_parts) > 1:
try:
# This is now a datetime object
timestamp = datetime.strptime(line_parts[0] + " " + line_parts[1], "%d-%b-%Y %H:%M:%S.%f")
# Do stuff with data here (each stored seperately in line_parts list)
# For instance printing everything.
print("DateTime Object: " + str(timestamp))
print("Data: " + str(line_parts[2:]))
# Cast data to floats for use in arithmetic
data_point_one = float(line_parts[2])
print ("data_point_one * 2 = " + str(data_point_one * 2))
except ValueError:
# Lines that don't start with a timestamp take this route...
continue
答案 1 :(得分:1)
您可以使用pandas.read_csv()
,因为sep
参数(相当于delimiter
中的numpy.genfromtxt
)接受正则表达式。然后,用:
import pandas as pd
pd.read_csv('test.txt', sep='\s\s+', header=4)
您将获得所需的输出。