通过在熊猫中使用for循环合并多个数据框

时间:2020-09-15 20:35:55

标签: python pandas dataframe

我有多个(大约11个)数据帧,如下所示:

            Energy
Date              
2020-09-14  42    
2020-09-11  0     
2020-09-10  0     
2020-09-09  11    
2020-09-08  0     
2020-09-04  23    
2020-09-03  11    
2020-09-02  11    
2020-09-01  19    
2020-08-31  23    
2020-08-28  69    
2020-08-27  30    
2020-08-26  15    
2020-08-25  53    
2020-08-24  57    
2020-08-21  0     
2020-08-20  0     
2020-08-19  0     
2020-08-18  0     
2020-08-17  0   
            Materials
Date                 
2020-09-14  100      
2020-09-11  89       
2020-09-10  28       
2020-09-09  42       
2020-09-08  0        
2020-09-04  50       
2020-09-03  46       
2020-09-02  100      
2020-09-01  92       
2020-08-31  17       
2020-08-28  85       
2020-08-27  78       
2020-08-26  82       
2020-08-25  78       
2020-08-24  82       
2020-08-21  17       
2020-08-20  0        
2020-08-19  0        
2020-08-18  0        
2020-08-17  0  

如何将它们合并为一个大数据框,如下所示:

            Energy        Consumer Staples     Consumer Discretionary     ...
Date              
2020-09-14  42            20                   ..
2020-09-11  0             ..                   ..
2020-09-10  0             ..                   ..
2020-09-09  11            ..                   ..
2020-09-08  0     
2020-09-04  23    
2020-09-03  11    
2020-09-02  11    
2020-09-01  19    
2020-08-31  23    
2020-08-28  69    
2020-08-27  30    
2020-08-26  15    
2020-08-25  53    
2020-08-24  57    
2020-08-21  0     
2020-08-20  0     
2020-08-19  0     
2020-08-18  0     
2020-08-17  0   

我正在考虑使用for循环将其重复添加或连接到新的数据框中,但是以这种方式缺少date列。所以我想知道如何创建一个完整的数据框,在最左边的列中使用日期,其余的数据和列名称保持不变。所有11个数据帧都按日期索引,希望结果有一个Date列和11个带有列名的数据列。

我的代码是:

from collections import OrderedDict
import pandas as pd
import datetime as dt
import pandas_datareader as web
#====================================================
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)


cmaps=OrderedDict()
print(type(cmaps.items()))

#############
prev=70
endDate=dt.datetime.today().date()
sDate=endDate-pd.to_timedelta(prev,unit='d')
#############

#def get_price(tickers): #input is a list or Series
    #result=pd.DataFrame()
    #for i in tickers:
        #df=pd.DataFrame()
        
        #df['Adj Close']=web.DataReader(i,'yahoo',sDate,endDate)['Adj Close']
        #df['MA']=df['Adj Close'].rolling(5).mean()
        #df.sort_values(ascending=False,inplace=True,by="Date")
        #df['Higher?']=df['Adj Close']>df['MA']
        #df['Higher?']=df['Higher?'].astype(int)
        
    
        #result['{}'.format(i)]=df['Higher?']
        
    #return result


#--------------------------------------------------------------code from stackoverflow
def get_price(tickers,roll_num=20): #input is a list or Series
    result=pd.DataFrame()
    pic=pd.DataFrame()
    for i in tickers:
        try:
            df=pd.DataFrame()                
            df['Adj Close']=web.DataReader(i,'yahoo',sDate,endDate)['Adj Close']
            df['MA']=df['Adj Close'].rolling(roll_num).mean()
            df.sort_values(ascending=False,inplace=True,by="Date")  # sometimes error
            df['Higher?']=df['Adj Close']>df['MA']
            df['Higher?']=df['Higher?'].astype(int)
            result[str(i)]=df['Higher?']
        
            
            
        except Exception as ex:  # no date column
            print('Ticker', i, 'ERROR', ex)
            print(df)
    pic[tickers.name]=(result.sum(axis=1)/len(result.columns)*100).astype(int) 
    pic.name=tickers.name   
    pic.drop(pic.tail(roll_num-1).index,inplace=True)
    return pic
#--------------------------------------------------------------



test=pd.Series(['A','TSLA','KO','T','aapl','nke'])
test=test.str.replace('.','-')
test.name='I am test'
a=get_price(test)
print(a)


#=============================================================================

base_url = "http://www.sectorspdr.com/sectorspdr/IDCO.Client.Spdrs.Holdings/Export/ExportExcel?symbol="

data = {                    
    'Ticker' :      [ 'XLC','XLY','XLP','XLE','XLF','XLV','XLI','XLB','XLRE','XLK','XLU' ]          
,   'Name' :    [ 'Communication Services','Consumer Discretionary','Consumer Staples','Energy','Financials','Health Care','Industrials','Materials','Real Estate','Technology','Utilities' ]           
}                   

spdr_df = pd.DataFrame(data)     

print(spdr_df)


final_product=pd.DataFrame()

for i, row in spdr_df.iterrows():
    url =  base_url + row['Ticker']
    df_url = pd.read_excel(url)
    header = df_url.iloc[0]
    holdings_df = df_url[1:]
    holdings_df.set_axis(header, axis='columns', inplace=True)
    holdings_df=holdings_df['Symbol'].str.replace('.','-')
    holdings_df.name=row.Name
    
    
    b=get_price(holdings_df)
    print(b)

    
    
    

1 个答案:

答案 0 :(得分:1)

我将执行以下操作,假设您希望每行有一个唯一的日期,而所有其他数据都作为列:

dataframes = [df1, df2] # create list with all dataframes you are interested in
pd.concat([df.set_index('Date') for df in dataframes], ignore_index=False, axis=1)

关键是确保Date是所有索引的索引,因此pd.concat()知道要“联接”索引上的数据帧。另外,您是在axis = 1(列轴)上串联的,因此您也需要指定它。

相关问题