按照Pandas Datetime索引,我按周计算事件并绘制它们。每个对象目前都是pandas.core.series.Series
。由于数据是每年下载的,因此某些周会被拆分。这是一个例子:
Datetime
2005-12-18 1840
2005-12-25 1959
2006-01-01 1695
Datetime
2006-01-01 285
2006-01-08 1917
2006-01-15 1821
Freq: W-SUN, dtype: int64
2006-01-01
周应该有285 + 1695 = 1980
个总事件。
如果我连接这两个系列,
import pandas as pd
pd.concat([weeks2005, weeks2006])
这不会发生。将有大量的尖峰"由于这些不连续性,在数据/图中。我该怎么修改呢?
答案 0 :(得分:1)
您可以将add
与参数fill_value=0
:
print weeks2005.add(weeks2006, fill_value=0)
2005-12-18 1840
2005-12-25 1959
2006-01-01 1980
2006-01-08 1917
2006-01-15 1821
Freq: W-SUN, dtype: float64
然后你可以通过astype
投射到int
:
print weeks2005.add(weeks2006, fill_value=0).astype(int)
2005-12-18 1840
2005-12-25 1959
2006-01-01 1980
2006-01-08 1917
2006-01-15 1821
Freq: W-SUN, dtype: int32
编辑:
如果您有50个Series
,则index
可以使用concat
和groupby
sum
:
import pandas as pd
dt1 = pd.to_datetime('2005-12-18')
idx1 = pd.date_range(dt1, periods=3, freq='W-SUN')
weeks2005 = pd.Series( [1840, 1959, 1695], index=idx1)
dt2 = pd.to_datetime('2006-01-01')
idx2 = pd.date_range(dt2, periods=3, freq='W-SUN')
weeks2006 = pd.Series( [285, 1917, 1821], index=idx2)
dt3 = pd.to_datetime('2006-01-15')
idx3 = pd.date_range(dt3, periods=3, freq='W-SUN')
weeks2006a = pd.Series( [100, 200, 500], index=idx3)
weeks = [weeks2005, weeks2006, weeks2006a ]
print weeks
[2005-12-18 1840
2005-12-25 1959
2006-01-01 1695
Freq: W-SUN, dtype: int64, 2006-01-01 285
2006-01-08 1917
2006-01-15 1821
Freq: W-SUN, dtype: int64, 2006-01-15 100
2006-01-22 200
2006-01-29 500
Freq: W-SUN, dtype: int64]
#concat list of series
#duplicity of some index value in output series
concated_series = pd.concat([weeks2005, weeks2006, weeks2006a]
#concated_series = pd.concat(weeks)
print concated_series
#2005-12-18 1840
#2005-12-25 1959
#2006-01-01 1695
#2006-01-01 285
#2006-01-08 1917
#2006-01-15 1821
#2006-01-15 100
#2006-01-22 200
#2006-01-29 500
#dtype: int64
#grouping by index and aggregation sum
output = concated_series.groupby(by=concated_series.index).sum()
#level=0 is first level of multiindex, but it works in index too
#output = concated_series.groupby(level=0).sum()
print output
#2005-12-18 1840
#2005-12-25 1959
#2006-01-01 1980
#2006-01-08 1917
#2006-01-15 1921
#2006-01-22 200
#2006-01-29 500
#dtype: int64
有关groupby
示例的更多信息是here。
答案 1 :(得分:0)
您可以将系列转换为数据帧,然后使用日期作为键将它们合并在一起:
import pandas as pd
from pandas import Series, DataFrame
df2005 = pd.DataFrame(weeks2005.values)
df2005.columns = ["Datetime"]
df2006 = pd.DataFrame(weeks2006.values)
df2006.columns = ["Datetime"]
def split_datetime(record):
record_splited = record.partition(" ")
return record_splited[0]
def split_number(record):
record_splited = record.partition(" ")
return int(record_splited[1])
df2005["Number"] = df2005["Datetime"].apply(split_number)
df2005["Datetime"] = df2005["Datetime"].apply(split_datetime)
df2006["Number"] = df2006["Datetime"].apply(split_number)
df2006["Datetime"] = df2006["Datetime"].apply(split_datetime)
df_merge = pd.merge(df2005, df2006, on="Datetime", how="outer").fillna(0)
df_merge["Sum"] = df_merge["Number_x"] + df_merge["Number_y"]
df_merge.drop(["Number_x", "Number_y"], axis=1)
print df_merge