我知道那里有类似标题的问题,但没有一个能真正回答我的问题。 我有一个数据框如下。 “索引”列实际上是时间戳。 A列是已将多少吨物料倾倒到破碎机中。 B列是每个时间戳的破碎率。我想知道的是,何时将基于破碎率(B列)破碎物料(A列)。
有三种可能的情况。
我试图计算A和B列的累加值,并使用merge_asof执行模糊连接。但是,由于未存储过多的破碎能力,因此无法正常工作。仅考虑物料装载后的破碎率。
A = {'index':range(1,11),'A':[300,0,0,400,0,0,0,0,150,0]}
B = {'index':range(1,11),'B':[102,103,94,120,145,114,126,117,107,100]}
A = pd.DataFrame(data=A)
B = pd.DataFrame(data=B)
这是预期的结果:
IndexA A IndexB B_accumulate
1 300 4 419
4 400 8 502
9 150 10 207
B_accumulate是破碎率的总和(B),当物料被破碎时(当B_accumlate> = A时)将其重置为0
答案 0 :(得分:1)
这是一个非常冗长的解决方案,希望对您的完整数据具有普遍性。我相信您可以简化它。
C = A.join(B.set_index('index'), on='index')
C['A_filled'] = C['A'].replace(to_replace=0, method='ffill')
C['cumul_load'] = C['A'].cumsum()
C['load_number'] = C.groupby('cumul_load').ngroup() + 1
C['B_accum'] = C.groupby('load_number')['B'].cumsum()
C['A_fully_crushed'] = C['B_accum'] > C['A_filled']
C['first_index_fully_crushed'] = C.groupby('load_number')['A_fully_crushed'].cumsum() == 1
indexA_ = C['index'][C['A'] > 0].tolist()
A_ = C['A'][C['A'] > 0].tolist()
indexB_ = C['index'][C['first_index_fully_crushed'] == True].tolist()
B_accumulate_ = C['B_accum'][C['first_index_fully_crushed'] == True].tolist()
result = pd.DataFrame({'indexA': indexA_, 'A': A_, 'indexB': indexB_, 'B_accumulate': B_accumulate_})
这产生
indexA A indexB B_accumulate
0 1 300 4 419
1 6 400 9 464
答案 1 :(得分:1)
创建DF组合的A和B:
A = {'index':range(1,11),'A':[300,0,400,0,0,0,0,0,100,0]}
B = {'index':range(1,11),'B':[102,103,94,120,145,114,126,117,107,87]}
df_A = pd.DataFrame(data=A)
df_B = pd.DataFrame(data=B)
df_com = pd.concat([df_A,df_B],axis=1).drop('index',axis=1)
创建索引:
indexA = list(df_com.A[df_com.A.ne(0)].index + 1)
indexB = np.array(indexA) - 2
indexB = np.append(indexB[1:],(len(df_com)-1))
在col A中用ffill()替换0:
df_com['A'] = df_com.A.replace(0,method='pad')
groupby并添加索引列:
df_new =df_com.groupby("A",sort=False).apply(lambda x:x.B.shift(1).sum()).reset_index()
df_new['indexA'] = indexA
df_new['indexB'] = indexB
df_new
答案 2 :(得分:1)
可能的方法。该问题分为两部分-获取实际材料量(不能为负数)并分析负载(当在当前时间步中有任何数量的材料要压碎时,按行分组)。
import numpy as np
import pandas as pd
def get_load(df):
""" get loaded material minus crushed material """
current_load = (df['A'] - df['B']).values
if current_load[0] < 0:
current_load[0] = 0
for idx in range(1, len(current_load)):
correct_value = current_load[idx - 1] + current_load[idx]
if correct_value < 0:
current_load[idx] = 0
else:
current_load[idx] = correct_value
return current_load
def get_work_load_chunk_stat(df):
""" get chunks when material actually crushing """
if df['load'].sum() == 0:
return
ans = pd.DataFrame(
{'indexA': [df.iloc[0, :]['indexA']],
'total_load': [df['A'].sum()],
'loads_qty': [df[df['A'] > 0]['A'].count()],
'indexB': [df.iloc[-1, :]['indexB']],
'total_work': [df['B'].sum()]})
return ans
if __name__ == '__main__':
A = {'indexA': range(22),
'A': [0, 300, 0, 0, 0, 0, 400, 0, 100, 0, 0, 0, 300, 0, 0, 0, 0, 400, 0, 100, 0, 0]}
B = {'indexB': range(22),
'B': [99, 102, 103, 94, 120, 145, 114, 126, 117, 107, 87, 99, 102, 103, 94, 120, 145, 114, 126, 117, 107, 87]}
data = pd.concat([pd.DataFrame(data=A), pd.DataFrame(data=B)], axis=1)
data['load'] = get_load(data)
data['load_check'] = np.where(data['load'] == 0, 1, 0)
data['load_check'] = data['load_check'].shift(fill_value=0).cumsum()
# print(data)
result = (
data
.groupby('load_check')
.apply(get_work_load_chunk_stat)
.reset_index(drop=True))
print(result)
输出:
indexA total_load loads_qty indexB total_work
0 1 300 1 4 419
1 6 500 2 10 551
2 12 300 1 15 419
3 17 500 2 21 551
答案 3 :(得分:1)
我使用Series而不是DataFrame简化了结构,索引从零开始。 cumsum()和searchsorted()被应用。
Load = pd.Series([300,0,0,400,50,0,0,0,150,0]) # aka 'A'
Rate = pd.Series([102,103,94,120,145,114,126,117,107,100]) # aka 'B'
# Storage for the result:
H=[] # [ (indexLoad, Load, indexRate, excess) ... ]
# Find the 1st non 0 load:
load1_idx= len(Load)
for lix in range(len(Load)):
a= Load[lix]
if a!=0:
csumser= Rate.cumsum()
rix= csumser.searchsorted(a)
excess= csumser[rix]-a
H.append( (lix,a,rix,excess) )
load1_idx=lix
break
# Processing
for lix in range(load1_idx+1,len(Load)):
a=Load[lix]
if a==0:
continue
last_rix= H[-1][-2]
csumser[last_rix:]= Rate[last_rix:]
if lix==last_rix:
csumser[lix]= H[-1][-1] # excess
csumser[last_rix:]= csumser[last_rix:].cumsum()
rix= csumser[last_rix:].searchsorted(a)
rix+= last_rix
excess= csumser[rix]-a
H.append( (lix,a,rix,excess) )
df= pd.DataFrame(H, columns=["indexLoad","Load","indexRate","rate_excess"])
print(df)
indexLoad Load indexRate rate_excess
0 0 300 3 119
1 3 400 6 104
2 4 50 6 76
3 8 150 7 93