我尽可能地简化了代码,但它仍然很长,它应该说明问题。
我正在从数据框中抽取天气数据:
import numpy as np
import pandas as pd
#dataframe
dates = pd.date_range('19510101',periods=16000)
data = pd.DataFrame(data=np.random.randint(0,100,(16000,1)), columns =list('A'))
data['date'] = dates
data = data[['date','A']]
#create year and season column
def get_season(row):
if row['date'].month >= 3 and row['date'].month <= 5:
return '2'
elif row['date'].month >= 6 and row['date'].month <= 8:
return '3'
elif row['date'].month >= 9 and row['date'].month <= 11:
return '4'
else:
return '1'
data['Season'] = data.apply(get_season, axis=1)
data['Year'] = data['date'].dt.year
我想选择使用预定年/季节元组的随机年份:
#generate an index of year and season tuples
index = [(1951L, '1'),
(1951L, '2'),
(1952L, '4'),
(1954L, '3'),
(1955L, '1'),
(1955L, '2'),
(1956L, '3'),
(1960L, '4'),
(1961L, '3'),
(1962L, '2'),
(1962L, '3'),
(1979L, '2'),
(1979L, '3'),
(1980L, '4'),
(1983L, '2'),
(1984L, '2'),
(1984L, '4'),
(1985L, '3'),
(1986L, '1'),
(1986L, '2'),
(1986L, '3'),
(1987L, '4'),
(1991L, '1'),
(1992L, '4')]
以下列方式对此进行采样:
生成4个列表,其中包含每个季节的年份(春季一个,夏季一个等)
coldsample = [[],[],[],[]] #empty list of lists
for (yr,se) in index:
coldsample[int(se)-1] += [yr] #function which gives the years which have extreme seasons [[1],[2],[3],[4]]
coldsample
从此列表中选择一个随机年份
cold_ctr = 0 #variable to count from (1 is winter, 2 spring, 3 summer, 4 autumn)
coldseq = [] #blank list
for yrlist in coldsample:
ran_yr = np.random.choice(yrlist, 1) #choose a randomly sampled year from previous cell
cold_ctr += 1 # increment cold_ctr variable by 1
coldseq += [(ran_yr[0], cold_ctr)] #populate coldseq with a random year and a random season (in order)
然后生成一个选择多个随机年的新数据框
df = []
for i in range (5): #change the number here to change the number of output years
for item in coldseq: #item is a tuple with year and season, coldseq is cold year and season pairs
df.append(data.query("Year == %d and Season == '%d'" % item))
问题在于,每次都选择coldseq
(具有相同的年/季组合),并且不会生成新的冷搜索。我需要将coldseq重置为空并为最终for循环的每次迭代生成一个新的,但是看不到这样做的方法。我已尝试以多种方式在循环中嵌入代码,但它似乎不起作用。
答案 0 :(得分:0)
您可以从索引创建第二个数据框,然后对其进行采样。
df_index = pd.DataFrame(index)
coldseq = df_index.sample(5)
coldseq.apply(lambda x: df.append("Year == '{0}' and Season == '{1}'".format(x[0], x[1])), axis = 1) # or similar to append the query
答案 1 :(得分:0)
想出来,嵌入循环并在循环中将计数器重置为0:
cold_ctr = 0 #variable to count from (1 is winter, 2 spring, 3 summer, 4 autumn)
coldseq = [] #blank list
df = []
#number of cold years
for i in range (5): #change number here for number of cold years
for yrlist in coldsample:
ran_yr = np.random.choice(yrlist, 1) #choose a randomly sampled year from previous cell
cold_ctr += 1 # increment cold_ctr variable by 1
coldseq += [(ran_yr[0], cold_ctr)]
for item in coldseq: #item is a tuple with year and season, coldseq is all extreme cold year and season pairs
df.append(data.query("Year == %d and Season == '%d'" % item))
coldseq = [] #reset coldseq to an empty list so it samples from a new random year
cold_ctr = 0 #reset counter to 0 so seasons stay as 1,2,3,4