我下面有数据框。 我想从连续的列值中提取最大和最小时间。 我该怎么办?
import pandas as pd
import numpy as np
raw_data = {'Time':[281.54385,298.64380,321.29645,321.39640,419.58545,430.68540,
533.96025,580.37990,590.85605,634.06015,724.16010,750.26000,
777.87955,830.97945,850.07940],
'CF_A': [1,1,1,0,0,0,1,1,1,2,2,2,0,0,0],
'CF_B': [1,1,1,1,1,1,0,0,0,0,1,1,1,0,0],
'CF_C': [0,0,2,2,3,3,3,3,1,1,1,1,0,0,0],
}
data = pd.DataFrame(raw_data)
dataframe - Input (see picture)
每列中的变量连续出现,我想添加新的数据框 总结与序列开始和结束相对应的时间。
想要的结果在下面。
答案 0 :(得分:0)
我建议对case
使用索引,以避免使用多个具有相同值的列名:
#filter column with CF
cols = data.filter(like='CF').columns
#output list of Series
L = []
for col in cols:
#create groups by consecutive values
s = data[col].ne(data[col].shift()).cumsum().rename('g')
#grouping by each column with helper groups
g = data.groupby([s, data[col]])['Time']
#difference by first and last value
d = g.last() - g.first()
#append sum by second level of MultiIndex
L.append(d.sum(level=1))
#join all Series together, cases are index values
df = pd.concat(L, axis=1, keys=cols).fillna(0)
print (df)
CF_A CF_B CF_C
0 181.48885 119.19985 89.29980
1 96.64840 202.86100 159.40395
2 116.19985 0.00000 0.09995
3 0.00000 0.00000 160.79445
但是如果真的需要预期的输出:
#filter column with CF
df1 = data.filter(like='CF')
#flatten all values of cases, get sorted unique values
idx = np.sort(np.unique(df1.values.ravel()))
print (idx)
#output list of Dataframes
L = []
for col in df1.columns:
#create groups by consecutive values
s = data[col].ne(data[col].shift()).cumsum().rename('g')
#grouping by each column with helper groups
g = data.groupby([s, data[col]])['Time']
#difference by first and last value
d = g.last() - g.first()
#sum by second level of MultiIndex, add missing rows by reindex
df = d.sum(level=1).rename_axis('Case').reindex(idx, fill_value=0).reset_index()
#append df with renamed columns names
L.append(df.add_prefix(col + '_'))
#join all DataFrames together
df = pd.concat(L, axis=1)
print (df)
CF_A_Case CF_A_Time CF_B_Case CF_B_Time CF_C_Case CF_C_Time
0 0 181.48885 0 119.19985 0 89.29980
1 1 96.64840 1 202.86100 1 159.40395
2 2 116.19985 2 0.00000 2 0.09995
3 3 0.00000 3 0.00000 3 160.79445