我是python的新手。我有一个关于如何总结前几行的问题。数据集是:
df=pd.DataFrame({'ID':[1,1,1,1,2,2,2,2],'reason':['B','A','B','A','A','A','B','A'],'result':['W','W','Z','X','X','W','Z','W']})
ID reason result
0 1 B W
1 1 A W
2 1 B Z
3 1 A X
4 2 A X
5 2 A W
6 2 B Z
7 2 A W
我想总结具有相同ID的原因的历史数据(前一行)。我还想用结果A总结结果的历史数据。结果应该如下:
ID reason result Previous_reason Previous_result_reasonA
0 1 B W
1 1 A W B
2 1 B Z B|A W
3 1 A X B|A|B W
4 2 A X
5 2 A W A X
6 2 B Z A|A X|W
7 2 A W A|A|B X|W
提前谢谢。
答案 0 :(得分:0)
假设DataFrame
按ID
排序,您可以解决它O(n):
import pandas as pd
df = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2],
'reason':['B','A','B','A','A','A','B','A'],
'result':['W','W','Z','X','X','W','Z','W']})
df['Previous_reason'] = [''] * len(df)
df['Previous_result_reasonA'] = [''] * len(df)
result_reasonA = ''
for r in range(1, len(df)):
if df['ID'][r] == df['ID'][r-1]:
df.loc[r, 'Previous_reason'] = \
df['Previous_reason'][r-1] + '|' + df['reason'][r-1]
df.loc[r, 'Previous_result_reasonA'] = \
df['Previous_result_reasonA'][r-1]
if result_reasonA:
df.loc[r, 'Previous_result_reasonA'] += \
'|' + result_reasonA
else:
df.loc[r, 'Previous_reason'] = ''
if df['reason'][r] == 'A':
result_reasonA = df['result'][r]
else:
result_reasonA = ''
# Clear trailing `|` separators
df['Previous_reason'] = \
df['Previous_reason'].apply(lambda x: x[1:])
df['Previous_result_reasonA'] = \
df['Previous_result_reasonA'].apply(lambda x: x[1:])
print df
输出:
ID reason result Previous_reason Previous_result_reasonA
0 1 B W
1 1 A W B
2 1 B Z B|A W
3 1 A X B|A|B W
4 2 A X
5 2 A W A X
6 2 B Z A|A X|W
7 2 A W A|A|B X|W
但问题是,是否涵盖所有特殊情况。我无法察觉,因为我不知道数据的含义。