组合行并在数据框中添加值

时间:2017-09-06 12:03:28

标签: python pandas dataframe

我有一个数据框(名为table),其中6列标记为[price1,price2,price3,time,type,volume]

对于类型,我得到了Q'和' T',安排如下:

Q

Ť

Q

Ť

Ť

Q

现在我想将行与连续的T组合起来并加上卷的值。连续Ts的价格和时间价值相同

即。我想要

价格......:时间:类型:成交量:

10000 2012.05 Q 10

10000 2012.05 T 20

10000 2012.05 Q 10

10000 2012.06 T 20

10000 2012.06 T 30

10000 2012.07 Q 10

是:

10000 2012.05 Q 10

10000 2012.05 T 20

10000 2012.05 Q 10

10000 2012.06 T 20 + 30 = 50

10000 2012.07 Q 10

这是我的代码,但没有返回所需的结果,所以有人可以帮我解决我的错误吗?

    def combine(df):
    combined = [] # Init empty list
    length = len(df.iloc[:,0]) # Get the number of rows in DataFrame
    i = 0
    while i < length:
        num_elements = num_elements_equal(df, i, 0, 'T') # Get the number of consecutive 'T's
        if num_elements <= 1: # If there are 1 or less T's, append only that element to combined, with the same type
            combined.append([df.iloc[i,0],df.iloc[i,1],df.iloc[i,2],df.iloc[i,3],df.iloc[i,4],df.iloc[i,5]])
        else: # Otherwise, append the sum of all the elements to combined, with 'T' type
            combined.append(['T', sum_elements(df, i, i+num_elements, 5)])
        i += max(num_elements, 1) # Increment i by the number of elements combined, with a min increment of 1
    return pd.DataFrame(combined, columns=df.columns) # Return as DataFrame

def num_elements_equal(df, start, column, value): # Counts the number of consecutive elements
    i = start
    num = 0
    while i < len(df.iloc[:,column]):
        if df.iloc[i,column] == value:
            num += 1
            i += 1
        else:
            return num
    return num

def sum_elements(df, start, end, column): # Sums the elements from start to end
    return sum(df.iloc[start:end, column])

tableT = combine(table)
tableT

raw data (Table) looks like this

1 个答案:

答案 0 :(得分:1)

IIUC:

输入数据帧,df:

   Price     Time Type  Volume
0  10000  2012.05    Q      10
1  10000  2012.05    T      20
2  10000  2012.05    Q      10
3  10000  2012.06    T      20
4  10000  2012.06    T      30
5  10000  2012.07    Q      10

合并T记录和总和量:

df.groupby(by=[df.Type.ne('T').cumsum(),'Price','Time','Type'], as_index=False)['Volume'].sum()

输出:

   Price     Time Type  Volume
0  10000  2012.05    Q      10
1  10000  2012.05    T      20
2  10000  2012.05    Q      10
3  10000  2012.06    T      50
4  10000  2012.07    Q      10