如果数据在时间段之间分割,如何正确连接Pandas Series?

时间:2015-12-18 07:48:32

标签: python pandas concatenation series

按照Pandas Datetime索引,我按周计算事件并绘制它们。每个对象目前都是pandas.core.series.Series。由于数据是每年下载的,因此某些周会被拆分。这是一个例子:

Datetime
2005-12-18    1840
2005-12-25    1959
2006-01-01    1695

Datetime
2006-01-01     285
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: int64

2006-01-01周应该有285 + 1695 = 1980个总事件。

如果我连接这两个系列,

import pandas as pd
pd.concat([weeks2005, weeks2006])

这不会发生。将有大量的尖峰"由于这些不连续性,在数据/图中。我该怎么修改呢?

2 个答案:

答案 0 :(得分:1)

您可以将add与参数fill_value=0

一起使用
print weeks2005.add(weeks2006, fill_value=0)
2005-12-18    1840
2005-12-25    1959
2006-01-01    1980
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: float64

然后你可以通过astype投射到int

print weeks2005.add(weeks2006, fill_value=0).astype(int)
2005-12-18    1840
2005-12-25    1959
2006-01-01    1980
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: int32

编辑:

如果您有50个Series,则index可以使用concatgroupby sum

import pandas as pd

dt1 = pd.to_datetime('2005-12-18')
idx1 = pd.date_range(dt1, periods=3, freq='W-SUN')
weeks2005 = pd.Series( [1840, 1959, 1695], index=idx1)

dt2 = pd.to_datetime('2006-01-01')
idx2 = pd.date_range(dt2, periods=3, freq='W-SUN')
weeks2006 = pd.Series( [285, 1917, 1821], index=idx2)

dt3 = pd.to_datetime('2006-01-15')
idx3 = pd.date_range(dt3, periods=3, freq='W-SUN')
weeks2006a = pd.Series( [100, 200, 500], index=idx3)

weeks = [weeks2005, weeks2006, weeks2006a ] 
print weeks
[2005-12-18    1840
2005-12-25    1959
2006-01-01    1695
Freq: W-SUN, dtype: int64, 2006-01-01     285
2006-01-08    1917
2006-01-15    1821
Freq: W-SUN, dtype: int64, 2006-01-15    100
2006-01-22    200
2006-01-29    500
Freq: W-SUN, dtype: int64]
#concat list of series 
#duplicity of some index value in output series
concated_series = pd.concat([weeks2005, weeks2006, weeks2006a]
#concated_series = pd.concat(weeks)
print concated_series
#2005-12-18    1840
#2005-12-25    1959
#2006-01-01    1695
#2006-01-01     285
#2006-01-08    1917
#2006-01-15    1821
#2006-01-15     100
#2006-01-22     200
#2006-01-29     500
#dtype: int64

#grouping by index and aggregation sum
output = concated_series.groupby(by=concated_series.index).sum()
#level=0 is first level of multiindex, but it works in index too
#output = concated_series.groupby(level=0).sum()
print output

#2005-12-18    1840
#2005-12-25    1959
#2006-01-01    1980
#2006-01-08    1917
#2006-01-15    1921
#2006-01-22     200
#2006-01-29     500
#dtype: int64

有关groupby示例的更多信息是here

答案 1 :(得分:0)

您可以将系列转换为数据帧,然后使用日期作为键将它们合并在一起:

import pandas as pd
from pandas import Series, DataFrame

df2005 = pd.DataFrame(weeks2005.values)
df2005.columns = ["Datetime"]
df2006 = pd.DataFrame(weeks2006.values)
df2006.columns = ["Datetime"]

def split_datetime(record):
    record_splited = record.partition(" ")
    return record_splited[0]

def split_number(record):
    record_splited = record.partition(" ")
    return int(record_splited[1])

df2005["Number"] = df2005["Datetime"].apply(split_number)
df2005["Datetime"] = df2005["Datetime"].apply(split_datetime)

df2006["Number"] = df2006["Datetime"].apply(split_number)
df2006["Datetime"] = df2006["Datetime"].apply(split_datetime)

df_merge = pd.merge(df2005, df2006, on="Datetime", how="outer").fillna(0)
df_merge["Sum"] = df_merge["Number_x"] + df_merge["Number_y"]
df_merge.drop(["Number_x", "Number_y"], axis=1)

print df_merge