使用pandas软件包已经有一段时间了,从dict创建pandas系列时我得到了意想不到的结果。附加简化数据:
d = {numpy.datetime64('2015-01-07T02:00:00.000000000+0200'): 42544017.198965244,
numpy.datetime64('2015-01-08T02:00:00.000000000+0200'): 40512335.181958228,
numpy.datetime64('2015-01-09T02:00:00.000000000+0200'): 39712952.781494237,
numpy.datetime64('2015-01-12T02:00:00.000000000+0200'): 39002721.453793451}
s = pd.Series(d)
s
这给了我:
2015-01-07 NaN
2015-01-08 NaN
2015-01-09 NaN
2015-01-12 NaN
dtype: float64
这对我来说完全出乎意料,因为我很确定Series在传递dict时会创建一个按键排序的系列。仍然会检查以前版本的熊猫是否有所不同。我在这里使用0.15.2。有什么建议吗?
刚刚在pandas 0.10.0上测试并获得相同的结果。我错过了什么或者它与我传递的类型有什么关系?
进一步测试表明,正是造成问题的日期时间。它们源自pandas read_csv方法,并应用了parse_dates。很奇怪它应该是一个问题。怀疑它可能是熊猫虫?
根据Jeff的要求,这里是从pandas dataframe生成dict的代码:
def _calculate_notional_cash(self):
'''Calculate the notional cash in portfolio
Done by getting difference beteen NAV and sum of positions
self.PMSposition_dict is dict of dataframes with Position information
'''
sumpos={}
for FundID, FundName in self.fund_number_name.iteritems():
sumpos[FundID] = {}
# self.PMSposition_dict[FundID]['MarketValueInZAR'].sum()
for date in self.PMSposition_dict[FundID].Date.unique():
s = self.PMSposition_dict[FundID][self.PMSposition_dict[FundID]['Date'] == date]['MarketValueInZAR'].sum()
sumpos[FundID][date] = s
self.sumpos = sumpos
答案 0 :(得分:0)
解决此问题的一种方法:
pd.Series(index=d.keys(), data=d.values())