我有一个包含[yyyy,value]项目的列表列表,每个子列表按递增年份排序。这是一个示例:
A = [
[[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2013, 17]],
[[2008, 6], [2009, 3], [2011, 1], [2013, 6]], [[2013, 9]],
[[2008, 4], [2011, 1], [2013, 4]],
[[2010, 3], [2011, 3], [2013, 1]],
[[2008, 2], [2011, 4], [2013, 1]],
[[2009, 1], [2010, 1], [2011, 3], [2013, 3]],
[[2010, 1], [2011, 1], [2013, 5]],
[[2011, 1], [2013, 4]],
[[2009, 1], [2013, 4]],
[[2008, 1], [2013, 3]],
[[2009, 1], [2013, 2]],
[[2013, 2]],
[[2011, 1], [2013, 1]],
[[2013, 1]],
[[2013, 1]],
[[2011, 1]],
[[2011, 1]]
]
我需要的是在min(year)和max(year)之间插入所有缺失的年份,并确保保留订单。因此,例如,取A:
的第一个子列表[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2013, 17]
应该是这样的:
[min_year, 0]...[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2012, 0],[2013, 17],..[max_year, 0]
此外,如果任何子列表仅包含单个项目,则应对其应用相同的过程,以便原始值保留其假定的顺序,并且正确插入其余的最小值(年份,值)项目。
有什么想法吗?
感谢。
答案 0 :(得分:3)
怎么样:
import numpy as np
def np_fill(data,min_year,max_year):
#Setup empty array
year_range=np.arange(min_year,max_year+1)
unit=np.dstack((year_range,np.zeros(max_year-min_year+1)))
overall=np.tile(unit,(len(data),1,1)).astype(np.int)
#Change the list to a list of ndarrays
data=map(np.array,data)
for num,line in enumerate(data):
#Find correct indices and update overall array
index=np.searchsorted(year_range,line[:,0])
overall[num,index,1]=line[:,1]
return overall
运行代码:
print np_fill(A,2008,2013)[:2]
[[[2008 5]
[2009 5]
[2010 2]
[2011 5]
[2012 0]
[2013 17]]
[[2008 6]
[2009 3]
[2010 0]
[2011 1]
[2012 0]
[2013 6]]]
print np_fill(A,2008,2013).shape
(18, 6, 2)
您在A的第二行中有2013年的副本,不确定这是否有目的。
有几个时间因为我很好奇,可以找到源代码here。如果您发现错误,请告诉我。
开始年/年末 - (2008,2013):
np_fill took 0.0454630851746 seconds.
tehsockz_fill took 0.00737619400024 seconds.
zeke_fill_fill took 0.0146050453186 seconds.
有点期待 - 转换为numpy数组需要花费大量时间。为了实现收支平衡,看起来这些年的跨度需要大约为30:
开始年/年 - (1985,2013):
np_fill took 0.049400806427 seconds.
tehsockz_fill took 0.0425939559937 seconds.
zeke_fill_fill took 0.0748357772827 seconds.
Numpy当然会从那里逐渐变得更好。如果你因任何原因需要返回一个numpy数组,那么numpy算法总是更快。
答案 1 :(得分:3)
minyear = 2008
maxyear = 2013
new_a = []
for group in A:
group = group
years = [point[0] for point in group]
print years
for year in range(minyear,maxyear+1):
if year not in years:
group.append([year,0])
new_a.append(sorted(group))
print new_a
这会产生:
[ [[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2012, 0], [2013, 17]],
[[2008, 6], [2009, 3], [2010, 0], [2011, 1], [2012, 0], [2013, 6]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 9]],
[[2008, 4], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 4]],
[[2008, 0], [2009, 0], [2010, 3], [2011, 3], [2012, 0], [2013, 1]],
[[2008, 2], [2009, 0], [2010, 0], [2011, 4], [2012, 0], [2013, 1]],
[[2008, 0], [2009, 1], [2010, 1], [2011, 3], [2012, 0], [2013, 3]],
[[2008, 0], [2009, 0], [2010, 1], [2011, 1], [2012, 0], [2013, 5]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 4]],
[[2008, 0], [2009, 1], [2010, 0], [2011, 0], [2012, 0], [2013, 4]],
[[2008, 1], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 3]],
[[2008, 0], [2009, 1], [2010, 0], [2011, 0], [2012, 0], [2013, 2]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 2]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 1]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 1]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 1]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 0]],
[[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 0]]]
答案 2 :(得分:3)
你走了,希望你喜欢它!
min_year = 2007 # for testing purposes I used these years
max_year = 2014
final_list = [] # you're going to be adding to this list the corrected values
for outer in A: # start by iterating through each outer list in A
active_years = {} # use this dictionary to keep track of which years are in each list and their values; sorry if you don't know about dictionaries
for inner in outer: # now iterate through each year in each of the outer lists and create a dictionary entry for each (print to see what it's doing)
active_years[inner[0]] = inner[1] # see who I'm creating a new key-value pair with the key as the year given by the 0th index of inner
new_outer = [] # this will be your new outer list
for year in range(min_year, max_year + 1): # now add to your active_years dictionary all the other years and give them value 0
if year not in active_years.keys(): # only add the years not in your dictionary already
active_years[year] = 0
for entry in active_years.keys(): # we now iterate through each key, in order
new_outer += [[entry, active_years[entry]]] # create your new outer list, watch carefully the brackets
final_list += [new_outer] # add to the final_list
print final_list # presto