列表查询使用if和.split()占用太长时间的Python

时间:2017-07-06 21:03:02

标签: python

很抱歉,如果这些内容已经得到解答。

我试图在列表中找到“2004”之后的值(2004年是2004年,2月是季度。)我在2004年之后的一年或2004年之后寻找列表中的成员为此,我尝试了以下代码:

get_recession_start() is a separate function that returns '2004q2'
(changed value cause online class question)

def get_recession_start():
        return '2004q4'

two_q_growth =['2000q1', '2000q2', '2000q3', '2000q4', '2001q1', '2001q2', '2001q3',
 '2001q4', '2002q1', '2002q2', '2002q3', '2002q4', '2003q1', '2003q2', '2003q3', '2003q4',
 '2004q1', '2004q2', '2004q3', '2004q4', '2005q1', '2005q2', '2005q3', '2005q4', '2006q1', 
 '2006q2', '2006q3', '2006q4', '2007q1', '2007q2', '2007q3', '2007q4', '2008q3', '2009q4', 
 '2010q1', '2010q2', '2010q3', '2010q4', '2011q1', '2011q2', '2011q3', '2011q4', '2012q1',
 '2012q2', '2012q3', '2012q4', '2013q1', '2013q2', '2013q3', '2013q4', '2014q1', '2014q2',
 '2014q3', '2014q4', '2015q1', '2015q2', '2015q3', '2015q4', '2016q1', '2016q2']


for year in two_q_growth:
    start_year = get_recession_start().split('q')[0]
    start_q = get_recession_start().split('q')[1]
    if ((year.split('q')[0] > start_year)
        | ((year.split('q')[0] == start_year) &
        (year.split('q')[1] > start_q))):
        recession_end.append(year)

此代码已经花了一天时间和一天运行,我不知道为什么。 (我还是Python的新手,并试图弄清楚如何优化。)

谢谢!

从技术上讲,我只需要第一个值,所以我正在努力创作一些带有休息的东西,但我也希望能更快地实现这个目标

3 个答案:

答案 0 :(得分:0)

您可以通过将year.split(' q')分配给变量而不是每次都调用split函数来清除它。试试这个

for year in two_q_growth:
    start_year, start_q = get_recession_start().split('q')
    year_split = year.split('q')
    if ((year_split[0] > start_year) or
        ((year_split[0] == start_year) and (year_split[1] > start_q))):
        recession_end.append(year)

答案 1 :(得分:0)

虽然它应该可以工作,但是你的代码非常低效且严重过于复杂。您甚至不需要解析数组,因为在这种情况下,即使字符串比较也可以正常工作:

two_q_growth = ['2000q1', '2000q2', '2000q3', '2000q4', '2001q1', '2001q2', '2001q3',
                '2001q4', '2002q1', '2002q2', '2002q3', '2002q4', '2003q1', '2003q2',
                '2003q3', '2003q4', '2004q1', '2004q2', '2004q3', '2004q4', '2005q1',
                '2005q2', '2005q3', '2005q4', '2006q1', '2006q2', '2006q3', '2006q4',
                '2007q1', '2007q2', '2007q3', '2007q4', '2008q3', '2009q4', '2010q1',
                '2010q2', '2010q3', '2010q4', '2011q1', '2011q2', '2011q3', '2011q4',
                '2012q1', '2012q2', '2012q3', '2012q4', '2013q1', '2013q2', '2013q3',
                '2013q4', '2014q1', '2014q2', '2014q3', '2014q4', '2015q1', '2015q2',
                '2015q3', '2015q4', '2016q1', '2016q2']

recession_start = '2004q2'  # get all entries after this one
recession_end = [year for year in two_q_growth if year > recession_start]

print(recession_end)

导致:

['2004q3', '2004q4', '2005q1', '2005q2', '2005q3', '2005q4', '2006q1', '2006q2', '2006q3',
'2006q4', '2007q1', '2007q2', '2007q3', '2007q4', '2008q3', '2009q4', '2010q1', '2010q2',
'2010q3', '2010q4', '2011q1', '2011q2', '2011q3', '2011q4', '2012q1', '2012q2', '2012q3',
'2012q4', '2013q1', '2013q2', '2013q3', '2013q4', '2014q1', '2014q2', '2014q3', '2014q4',
'2015q1', '2015q2', '2015q3', '2015q4', '2016q1', '2016q2']

如果您只需要recession_start值之后的第一个值,假设您的列表已排序(如果不是先通过sorted()运行):

recession_start = '2004q2'  # get the entry after this one
recession_end = None  # just in case we don't find it
for year in two_q_growth:  # loop through the list
    if year > recession_start:  # grab the first value higher than recession_start
        recession_end = year  # store it to recession end
        break  # break away, no need to loop further as we only want the first element

print(recession_end)
# 2004q3

答案 2 :(得分:0)

import numpy as np

# convert your data to 2D numpy array of integer numbers (year, quarter):
two_q_growth_arr = np.array([map(int, x.split('q')) for x in two_q_growth])

# Pull start_year and start_q computation out of the loop:
start_year, start_q = list(map(int, get_recession_start().split('q')))

# find indices of all data that satisfy your criteria:
ind = np.where((two_q_growth_arr[:, 0] > start_year) | ((two_q_growth_arr[:, 0] == start_year) & (two_q_growth_arr[:, 1] > start_q)))

# Extract the years as a list of integer numbers:
recession_end = two_q_growth_arr[ind][:,0].tolist()

# or, alternatively, convert to a list of strings:
recession_end = list(map(str, two_q_growth_arr[ind][:,0].tolist()))

另一种选择是将年和季度转换为单个小数年,然后简化条件:

import numpy as np
two_q_growth_arr = np.array([map(int, x.split('q')) for x in two_q_growth])
two_q_growth_arr = two_q_growth_arr[:, 0] + 0.25 * (two_q_growth_arr[:, 1] - 1.0)
ind = np.where(two_q_growth_arr[:, 0] > start_year)
recession_end = list(map(str, two_q_growth_arr[ind][:,0].tolist()))