我正在尝试使用(auto_arima)排除特定日期进行预测。
代码:
from pmdarima.arima import auto_arima
df = pd.read_csv('devices_transactions_count.csv')
def remove_holidays(date, transactions):
if date in ['2019-06-03', '2019-06-04', '2019-06-05', '2019-06-06', '2019-06-07', '2019-06-08', '2019-06-09',
'2019-08-08', '2019-08-09', '2019-08-10', '2019-08-11', '2019-08-12', '2019-08-13','20199-08-14',
'2019-08-15', '2019-08-16']:
return None
else:
return transactions
df['transactions'] = df.index.map(lambda i: remove_holidays(df.date.iloc[i], df.transactions.iloc[i]))
df.head()
train = df[df.date < '2019-09-20']
train.to_csv('train.csv')
train = pd.read_csv('train.csv')
del train['Unnamed: 0']
train.head()
train['transactions'] = train['transactions'].astype('float32')
train['date'].replace(regex=True, inplace=True, to_replace='M', value='')
train['date'] = pd.to_datetime(train['date'], format='%Y%m', errors='ignore', infer_datetime_format=True)
train = train.set_index(['date'])
decomposition = auto_arima(train.transactions, start_p=1, start_q=1,
max_p=3, max_q=3, m=12,
start_P=0, seasonal=True,
d=1, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
这将引发以下错误:ValueError:输入包含NaN,无穷大或对于dtype('float64')而言太大的值。
答案 0 :(得分:1)
我会将您的清理功能重写为列表查找:
skip_days = ['2019-06-03', '2019-06-04', '2019-06-05', '2019-06-06', '2019-06-07', '2019-06-08', '2019-06-09','2019-08-08', '2019-08-09', '2019-08-10', '2019-08-11', '2019-08-12', '2019-08-13','20199-08-14','2019-08-15', '2019-08-16']
# Exclude these days
df_filtered = df[~df['date'].isin(skip_days)]
这将从数据框中排除这些值,从而使您的数据集可以从nan / null值中清除。