我需要删除具有2000年之前日期的数据框的所有列。
一般方法是:
columnstokeep = list(DF) #gives me the column names
for i in range(len(columnstokeep)): #get rid of dates before year 2000
if int(columnstokeep[i][:4])<2000:
columnstokeep.remove(columnstokeep[i])
DF = DF[columnstokeep] #the new dataframe
我一直遇到列表索引超出范围错误。
这是因为len,在范围(len(columnstokeep))每次删除列表元素时都在改变吗?或者range(len(columnstokeep))在循环的持续时间内是否保持相同的值?
这是Dataframe
由于
答案 0 :(得分:3)
您可以使用列上的pd.to_datetimes
轻松完成此操作,然后选择大于2000
的列表。
# Create Example Data
frame = pd.DataFrame({
'1998-1-1': ['foo'],
'1999-1-1': ['bar'],
'2000-1-1': ['spam'],
'2001-1-1': ['eggs']
})
# Select columns which are after 2000
frame.loc[:,pd.to_datetime(frame.columns) >= '2000']
输出:
2000-1-1 2001-1-1
0 spam eggs
答案 1 :(得分:2)
对于问题的根源,你是对的。但我并不想重新计算范围。但是,由于您从列表i
中删除了第一个值,因此将超出剩余的columnstokeep
。我添加了一些打印件以更清楚地显示问题:
years = range(1990,2010)
columnstokeep=[]
#The column names kind of
for i in years:
columnstokeep.append(str(i)+'-01')
##This shows the error comment this
for i in range(len(columnstokeep)-1): #get rid of dates before year 2000
print(i,columnstokeep[i])#It prints every second year while in 199X
if int(columnstokeep[i][:4])<2000:
columnstokeep.remove(columnstokeep[i])
相反,你可以从最终开始迭代......
for i in range(len(columnstokeep)-1,-1,-1): #get rid of dates before year 2000
print(i,columnstokeep[i])#It prints every second year while in 199X
if int(columnstokeep[i][:4])<2000:
columnstokeep.remove(columnstokeep[i])
#DF = DF[columnstokeep] #the new dataframe
print(columnstokeep)
输出:
['2000-01', '2001-01', '2002-01', '2003-01', '2004-01', '2005-01', '2006-01', '2007-01', '2008-01', '2009-01']