我有以下时间序列数据:
private async Task<DialogTurnResult> ValidationFirstStepAsync(
WaterfallStepContext stepContext,
CancellationToken cancellationToken = default(CancellationToken))
{
// Access the bot UserInfo accessor so it can be used to get state info.
LanguageAccessor languageAccessor = await
_accessors.LanguageAccessor.GetAsync(stepContext.Context, null,
cancellationToken);
if ((languageAccessor)stepContext.Context.Activity.Text)
{
await stepContext.Context.SendActivityAsync(
"Hi!");
return await stepContext.NextAsync();
}
else
{
await stepContext.Context.SendActivityAsync("Sorry, your language is not supported");
return await stepContext.EndDialogAsync(); }
}
}
我想要的是一个列表年,
1998-01-02 09:30:00,0.4298,0.4337,0.4258,0.4317,6426369
1999-01-02 09:45:00,0.4317,0.4337,0.4258,0.4298,10589080
2000-01-02 10:00:00,0.4298,0.4337,0.4278,0.4337,9507980
2001-01-02 10:15:00,0.4337,0.4416,0.4298,0.4416,13639022
因此,我可以使用该列表来了解可以在该数据框中查询的年份。并非所有数据框都具有相同的年份。
years = list['1998'.'1999','2000','2001']
我正在尝试很多事情,但没有成功。有人可以向我解释如何解决这样的问题吗?
编辑1:根据一些建议,我正在这样做:
data = pd.read_csv(str(inFileName), index_col=0, parse_dates=True, header=None)
#data.iloc[:, 0]
print(pd.DatetimeIndex(data.iloc[:, 0]).year)
#print(data.iloc[:, 0])
#years = list(data.index)
#print(years)
for x in years:
然后我得到列表:data = pd.read_csv(str(inFileName), parse_dates=[0], header=None)
data.iloc[:, 0] = pd.to_datetime(data.iloc[:, 0])
data['year'] = data.iloc[:, 0].apply(lambda x: x.year)
year_list = data['year'].unique().tolist()
print(year_list)
for x in year_list:
newDF = data[x]
newDF.head()
print(newDF.head(5))
但是我不能从列表中创建一个新的数据框。我想为列表中的每个值创建一个新的数据框。我收到错误消息:
[2017, 2018, 2019]
我正在使用这个:
[2017, 2018, 2019]
Traceback (most recent call last):
File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2017
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./massageSM.py", line 123, in <module>
main(sys.argv[1:])
File "./massageSM.py", line 33, in main
newDF = data[x]
File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2017
并产生输出:
data = pd.read_csv("RHE.SM", parse_dates=[0], header=None)
data.iloc[:, 0] = pd.to_datetime(data.iloc[:, 0])
data['year'] = data.iloc[:, 0].apply(lambda x: x.year)
year_list = data['year'].unique().tolist()
print(year_list)
for x in year_list:
df = pd.DataFrame({'years':year_list})
print(df.head(5))
但是我要创建的是: 仅 2017 的数据框 仅 2018 的数据框 仅 2019
的数据框但是我不能对此进行硬编码,因为其他文件不会包含相同的年份。我需要列出可用的年份并进行迭代。
我也尝试过:
[2017, 2018, 2019]
years
0 2017
1 2018
2 2019
years
0 2017
1 2018
2 2019
years
0 2017
1 2018
2 2019
我得到以下输出,该输出起初很好,但是创建 newDF 时出现错误。
data = pd.read_csv("RHE.SM", header=None, parse_dates=[0])
year_list = data[0].dt.year.unique().tolist()
print(year_list)
data.index = pd.DatetimeIndex(data[0])
print(type(data.index))
print(data.index)
for x in year_list:
print(x)
newDF = data[x]
#newDF.head()
#print(newDF.head(5))
答案 0 :(得分:2)
我还没有测试过,但是我认为它会为您工作。
[Required
它首先将第一列转换为DateTime格式。然后,它将创建一个仅包含每个DateTime的年份组成部分的新列。最后,它将输出该列中每个唯一值的列表。
如果您还想将结果列表转换为新的数据框,只需在以下位置添加此行:
data.iloc[:, 0] = pd.to_datetime(data.iloc[:, 0])
data['year'] = data.iloc[:, 0].apply(lambda x: x.year)
year_list = data['year'].unique().tolist()
编辑:如果要将列表中的每个项目都转换为新的数据框,则可以添加以下内容:
df = pd.DataFrame({'years':year_list})
答案 1 :(得分:1)
如果您想按年份将一个数据框分成多个单独的数据框,则可以执行以下操作:
dfs = {
year: sub_df.drop(columns=["year"])
for year, sub_df in data.assign(year=lambda df: df[0].dt.year)\
.groupby("year")
}
出局:
{1998: 0 1 2 3 4 5
0 1998-01-02 09:30:00 0.4298 0.4337 0.4258 0.4317 6426369,
1999: 0 1 2 3 4 5
1 1999-01-02 09:45:00 0.4317 0.4337 0.4258 0.4298 10589080,
2000: 0 1 2 3 4 5
2 2000-01-02 10:00:00 0.4298 0.4337 0.4278 0.4337 9507980,
2001: 0 1 2 3 4 5
3 2001-01-02 10:15:00 0.4337 0.4416 0.4298 0.4416 13639022}
如果要遍历并将单独的dfs
写入单独的CSV,则可以执行以下操作:
for year, df in dfs.items():
filename = "base_name_{}.csv".format(year)
df.to_csv(filename, index=False)
原则上,您希望基于原始文件名的基名。
答案 2 :(得分:0)
最简单的情况是:
data = pd.read_csv(inFileName, header=None, parse_dates=[0])
data[0].dt.year.unique().tolist()
这利用了datetime accessor,它是快速且矢量化的
答案 3 :(得分:0)
首先,您需要确保您要从datetime
类型提取年份。假设您知道列的名称以及存储日期的位置,请执行以下操作:
df['datetime'] = pd.to_datetime(df['datetime'])
df['year'] = df['datetime'].apply(lambda x: x.year)
如果日期在索引中,请执行以下操作:
df['datetime'] = pd.to_datetime(df.reset_index()['index'])
df['datetime'] = pd.to_datetime(df['datetime'])
df['year'] = df['datetime'].apply(lambda x: x.year)
第一行默认从索引中获取值并将其放入名为“索引”的列中。第二个将数据转换为datetime
格式。
完成此操作后,您将提取唯一的年份:
years = df['year'].unique().tolist()