我想创建一个函数,该函数将从文件中读取一系列时间值(采样率存在差距,这就是问题所在),并且可以准确地读取200天并允许我遍历整个数据长度,假设10000天,有点滚滚的窗户。
我不确定如何编码。我是否可以添加一条语句来计算时间变量(x轴)的两个值之间的差值,直到精确到200天?
或者我可以以某种方式编写一个函数,该函数将找到起始值,例如t0,然后找到最接近t0 +(interval =)200天的数组元素。
到目前为止,我有:
f = open(reading the file from directory)
lines = f.readlines()
print(len(lines))
tx = np.array([]) # times
y= np.array([])
interval = 200 # days
for li in lines:
col = li.split()
t0 = np.array([])
t1 = np.array([])
tx = np.append(tx, float(col[0]))
y= np.append(y, float(col[1]))
t0 = np.append(t0, np.max(tx))
t1 = np.append(t1, tx[np.argmin(tx)])
print(t0,t1)
days = [t1 + dt.timedelta(days = float(x)) for x in days]
#y = np.random.randn(len(days))
# use pandas for convenient rolling function:
df = pd.DataFrame({"day":tx, "value": y}).set_index("day")
def closest_value(s):
if s.shape[0]<2:
return np.nan
X = np.empty((s.shape[0]-1, 2))
X[:, 0] = s[:-1]
X[:, 1] = np.fabs(s[:-1]-s[-1])
min_diff = np.min(X[:, 1])
return X[X[:, 1]==min_diff, 0][0]
df['closest_value'] = df.rolling(window=dt.timedelta(days=200))
['value'].apply(closest_value, raw=True)
print(df.tail(5))
Output error:
TypeError: float() argument must be a string or a number, not
'datetime.datetime'
另外, 前10个tx和ty值分别为:
0 0.003372722575018
0.015239999629557 0.003366515509113
0.045829999726266 0.003385171061055
0.075369999743998 0.003385171061055
0.993219999596477 0.003366515509113
1.022699999623 0.003378941085299
1.05217999964952 0.003369617612836
1.08166999975219 0.003397665493594
3.0025899996981 0.003378941085299
3.04120999993756 0.003394537568711
答案 0 :(得分:1)
import numpy as np
import pandas as pd
import datetime as dt
# load data in days and y arrays
# ... or generate them:
N = 1000 # number of days
day_min = dt.datetime.strptime('2000-01-01', '%Y-%m-%d')
day_max = 2000
days = np.sort(np.unique(np.random.uniform(low=0, high=day_max, size=N).astype(int)))
days = [day_min + dt.timedelta(days = int(x)) for x in days]
y = np.random.randn(len(days))
# use pandas for convenient rolling function:
df = pd.DataFrame({"day":days, "value": y}).set_index("day")
def closest_value(s):
if s.shape[0]<2:
return np.nan
X = np.empty((s.shape[0]-1, 2))
X[:, 0] = s[:-1]
X[:, 1] = np.fabs(s[:-1]-s[-1])
min_diff = np.min(X[:, 1])
return X[X[:, 1]==min_diff, 0][0]
df['closest_value'] = df.rolling(window=dt.timedelta(days=200))['value'].apply(closest_value, raw=True)
print(df.tail(5))
输出:
value closest_value
day
2005-06-15 1.668638 1.591505
2005-06-16 0.316645 0.304382
2005-06-17 0.458580 0.445592
2005-06-18 -0.846174 -0.847854
2005-06-22 -0.151687 -0.166404
答案 1 :(得分:0)
您可以使用熊猫,设置日期时间范围并创建while循环来批量处理数据。
import pandas as pd
from datetime import datetime, timedelta
# Load data into pandas dataframe
df = pd.read_csv(filepath)
# Name columns
df.columns = ['dates', 'num_value']
# Convert strings to datetime
df.dates = pd.to_datetime(df['dates'], format='%d/%m/%Y')
# Print dates within a 200 day interval and move on to the next interval
i = 0
while i < len(df.dates):
start = df.dates[i]
end = start + timedelta(days=200)
print(df.dates[(df.dates >= start) & (df.dates < end)])
i += 200
如果列中没有标题,则应省略跳过行:
dates num_value
2004-7-1 1
2004-7-2 5
2004-7-4 8
2004-7-5 11
2004-7-6 17
df = pd.read_table(filepath, sep="\s+", skiprows=1)