Python Pandas加入了不同长度的Dataframe

时间:2014-09-23 12:06:29

标签: python pandas

我正在尝试使用来自两个数据帧的数据,比如frame_a和frame_b,它们由时间戳索引。 frame_a只包含一列并且比frame_b短(frame_a的所有索引都包含在frame_b&#s; s索引中)。我的结构代码如下所示。

dataframe = DataFrame(index=frame_a.index)
dataframe.join(frame_a["a"])
dataframe.join(frame_b["b"])

现在'数据框'由frame_a的索引索引,并且应该只取得frame_b [" b"]中的那些值,其中索引对应于dataframe的索引。我现在用iterrows()提取数据并用它来创建一个新的DataFrame' returnframe':

timestamps = []
results = []
for row_idx, row in enumerate(dataframe.iterrows()):
        try:                    ## note that row_idx is integer and index is here a datetime
            index,data = row
            rowData = data.tolist()

            a = rowData[1]
            b = rowData[2]
            if a < b:
                b = b+1
            results.append(b)
            timestamps.append(b.index[row_idx])
        except:
            pass
returnframe = DataFrame(index=timestamps) # this frame is even shorter than frame_a
returnframe["results"] = results

我需要try语句来过滤一些NaN ...第一个问题:我无法弄清楚如何使用&#39; apply&#39;或者代替for循环的东西。有谁知道更有效的方式?第二个问题:

returnframe.join(frame_b["something"])

表现不像我预期的那样。我需要将frame_b&#39的数据添加到我的返回帧中,其中两个帧都是&#39; indices(timestamps)是相等的。但是,加入&#39;似乎在中间某处添加了一些行(可能是由于重复的索引??)。对于我的其他函数来说重要的是返回帧的长度不会被后面添加的数据改变。

为了完整性,我的整个函数(df_input和df_signals是函数参数)。它需要信号(整数)并根据已发出信号的“购买”来调整持股,资金和估值。或者&#39;出售&#39;。 :

def calculatePerformance(self, df_input, df_signals, closei=3, changei=11, printResults=True):
    funds_array = []
    holdings_array = []
    valuation_array = []
    percChange_array = []
    date_array = []
    type_array = []
    signal_array = []
    type = df_input["type"][0]

    funds = df_input["close"][0]*8
    holdings = 0
    valuation = funds
    percChange = 0

    merged = pd.DataFrame(index=df_signals.index)
    merged = merged.join(df_signals["type"])
    merged = merged.join(df_input["close"])
    merged = merged.join(df_signals["signal"])

    prices = np.array([price for index, price in merged["close"].iteritems()])
    signals = np.array([signal for index, signal in merged["signal"].iteritems()])

    for row_idx, row in enumerate(merged.iterrows()):
        try:                    ## note that row_idx is integer and index is here a datetime
            index,data = row
            rowData = data.tolist()

            price = rowData[1]         
            signal = np.int(rowData[2])
            if isinstance(signal, np.int) and signal != 0:
                if signal > 0 and (signal*price) > funds :
                    signal=0
                elif signal < 0 and (holdings == 0 or (holdings+signal) < 0):
                    signal = holdings

                funds -= signal*price
                holdings += signal
                valuation = funds + holdings*price
                percChange = (valuation - valuation_array[0])/valuation_array[0]*100.

            type_array.append(type)
            date_array.append(index)
            signal_array.append(df_signals["signal"][index])

            funds_array.append(funds)
            holdings_array.append(holdings)
            valuation_array.append(valuation)
            percChange_array.append(percChange)



        except:
            pass


    # build and return the performance df of this stock
    df_performance = pd.DataFrame(index=date_array)        
    df_performance["type"] = type_array

    df_performance["signal"] = signal_array
    ## i would like to use the following line, but it wont work. i want to use it that way to eventually eliminate the need for the above for loop
    #df_performance["signal"] = df_signals["signal"]

    df_performance["funds"] = funds_array
    df_performance["holdings"] = holdings_array
    df_performance["valuation"] = valuation_array
    df_performance["percChange"] = percChange_array

return df_performance

0 个答案:

没有答案