我有一个没有每个日期的时间序列(即交易日期)。系列可以在这里复制。
dates=pd.Series(np.random.randint(100,size=30),index=pd.to_datetime(['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
'2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
'2010-01-14', '2010-01-15', '2010-01-19', '2010-01-20',
'2010-01-21', '2010-01-22', '2010-01-25', '2010-01-26',
'2010-01-27', '2010-01-28', '2010-01-29', '2010-02-01',
'2010-02-02', '2010-02-03', '2010-02-04', '2010-02-05',
'2010-02-08', '2010-02-09', '2010-02-10', '2010-02-11',
'2010-02-12', '2010-02-16']))
我希望在我的日期列表中显示该月的最后一天,即:'2010-01-29'和'2010-02-16'
我看过Get the last date of each month in a list of dates in Python
更具体地......
import pandas as pd
import numpy as np
df = pd.read_csv('/path/to/file/') # Load a dataframe with your file
df.index = df['my_date_field'] # set the dataframe index with your date
dfg = df.groupby(pd.TimeGrouper(freq='M')) # group by month / alternatively use MS for Month Start / referencing the previously created object
# Finally, find the max date in each month
dfg.agg({'my_date_field': np.max})
# To specifically coerce the results of the groupby to a list:
dfg.agg({'my_date_field': np.max})['my_date_field'].tolist()
...但无法弄清楚如何使其适应我的应用程序。提前谢谢。
答案 0 :(得分:2)
您可以尝试以下操作来获得所需的输出:
import numpy as np
import pandas as pd
dates=pd.Series(np.random.randint(100,size=30),index=pd.to_datetime(['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
'2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
'2010-01-14', '2010-01-15', '2010-01-19', '2010-01-20',
'2010-01-21', '2010-01-22', '2010-01-25', '2010-01-26',
'2010-01-27', '2010-01-28', '2010-01-29', '2010-02-01',
'2010-02-02', '2010-02-03', '2010-02-04', '2010-02-05',
'2010-02-08', '2010-02-09', '2010-02-10', '2010-02-11',
'2010-02-12', '2010-02-16']))
此:
dates.groupby(dates.index.month).apply(pd.Series.tail,1).reset_index(level=0, drop=True)
或者这个:
dates[dates.groupby(dates.index.month).apply(lambda s: np.max(s.index))]
两者都应该产生如下内容:
#2010-01-29 43
#2010-02-16 48
将其转换为列表:
dates.groupby(dates.index.month).apply(pd.Series.tail,1).reset_index(level=0, drop=True).tolist()
或者:
dates[dates.groupby(dates.index.month).apply(lambda s: np.max(s.index))].tolist()
两者都产生如下:
#[43, 48]
如果您正在处理超过一年的数据集,则需要按year
和month
进行分组。以下内容应该有所帮助:
import numpy as np
import pandas as pd
z = ['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
'2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
'2010-01-14', '2010-01-15', '2010-01-19', '2010-01-20',
'2010-01-21', '2010-01-22', '2010-01-25', '2010-01-26',
'2010-01-27', '2010-01-28', '2010-01-29', '2010-02-01',
'2010-02-02', '2010-02-03', '2010-02-04', '2010-02-05',
'2010-02-08', '2010-02-09', '2010-02-10', '2010-02-11',
'2010-02-12', '2010-02-16', '2011-01-04', '2011-01-05',
'2011-01-06', '2011-01-07', '2011-01-08', '2011-01-11',
'2011-01-12', '2011-01-13', '2011-01-14', '2011-01-15',
'2011-01-19', '2011-01-20', '2011-01-21', '2011-01-22',
'2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28',
'2011-01-29', '2011-02-01', '2011-02-02', '2011-02-03',
'2011-02-04', '2011-02-05', '2011-02-08', '2011-02-09',
'2011-02-10', '2011-02-11', '2011-02-12', '2011-02-16']
dates1 = pd.Series(np.random.randint(100,size=60),index=pd.to_datetime(z))
此:
dates1.groupby((dates1.index.year, dates1.index.month)).apply(pd.Series.tail,1).reset_index(level=(0,1), drop=True)
或者:
dates1[dates1.groupby((dates1.index.year, dates1.index.month)).apply(lambda s: np.max(s.index))]
两者都产生如下:
# 2010-01-29 66
# 2010-02-16 80
# 2011-01-29 13
# 2011-02-16 10
我希望这证明有用。
答案 1 :(得分:1)
您可以使用groupby
apply
和print (dates.groupby(dates.index.month).apply(lambda x: x.index[-1]))
1 2010-01-29
2 2010-02-16
dtype: datetime64[ns]
索引的最后一个值:
print (dates.groupby(dates.index.month).apply(lambda x: x.index.max()))
1 2010-01-29
2 2010-02-16
dtype: datetime64[ns]
另一种解决方案:
string
对于列表,首先按month
转换为print (dates.groupby(dates.index.month)
.apply(lambda x: x.index[-1]).dt.strftime('%Y-%m-%d').tolist())
['2010-01-29', '2010-02-16']
:
Month
如果每个值print (dates.groupby(dates.index.month).apply(lambda x: x.iloc[-1]))
1 55
2 48
dtype: int64
print (dates.groupby(dates.index.month).apply(lambda x: x.iloc[-1]).tolist())
[55, 48]
值的值需要使用strftime
:
year
编辑:
month
和index
需要months
转换dates=pd.Series(np.random.randint(100,size=30),index=pd.to_datetime(
['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
'2010-01-08', '2011-01-11', '2011-01-12', '2011-01-13',
'2012-01-14', '2012-01-15', '2012-01-19', '2012-01-20',
'2013-01-21', '2013-01-22', '2013-01-25', '2013-01-26',
'2013-01-27', '2013-01-28', '2013-01-29', '2013-02-01',
'2014-02-02', '2014-02-03', '2014-02-04', '2014-02-05',
'2015-02-08', '2015-02-09', '2015-02-10', '2015-02-11',
'2016-02-12', '2016-02-16']))
#print (dates)
iloc
:
print (dates.groupby(dates.index.to_period('m')).apply(lambda x: x.index[-1]))
2010-01 2010-01-08
2011-01 2011-01-13
2012-01 2012-01-20
2013-01 2013-01-29
2013-02 2013-02-01
2014-02 2014-02-05
2015-02 2015-02-11
2016-02 2016-02-16
Freq: M, dtype: datetime64[ns]
print (dates.groupby(dates.index.to_period('m'))
.apply(lambda x: x.index[-1]).dt.strftime('%Y-%m-%d').tolist())
['2010-01-08', '2011-01-13', '2012-01-20', '2013-01-29',
'2013-02-01', '2014-02-05', '2015-02-11', '2016-02-16']
print (dates.groupby(dates.index.to_period('m')).apply(lambda x: x.iloc[-1]))
2010-01 68
2011-01 96
2012-01 53
2013-01 4
2013-02 16
2014-02 18
2015-02 41
2016-02 90
Freq: M, dtype: int64
print (dates.groupby(dates.index.to_period('m')).apply(lambda x: x.iloc[-1]).tolist())
[68, 96, 53, 4, 16, 18, 41, 90]
period
EDIT1:如果需要将end of month
转换为df = dates.groupby(dates.index.to_period('m')).apply(lambda x: x.index[-1])
df.index = df.index.to_timestamp('m')
print (df)
2010-01-31 2010-01-08
2011-01-31 2011-01-13
2012-01-31 2012-01-20
2013-01-31 2013-01-29
2013-02-28 2013-02-01
2014-02-28 2014-02-05
2015-02-28 2015-02-11
2016-02-29 2016-02-16
dtype: datetime64[ns]
日期时间:
HEADERS