此代码:
import numpy as np
import pandas as pd
df = pd.DataFrame([['stop' , '1'], ['a1' , '2'], ['a1' , '3'], ['stop' , '4'], ['a2' , '5'], ['wildcard' , '6']] , columns=['a' , 'b'])
print(df)
打印:
a b
0 stop 1
1 a1 2
2 a1 3
3 stop 4
4 a2 5
5 wildcard 6
我尝试创建一个新的数据框,如果遇到停止,则会创建一个包含元组的新行,其中列的值为#a;'是元组的第一个元素,' b'是元组的后续元素。因此,对于转换后的df,新的df df_post结构是:
df_post = pd.DataFrame([['stop' , [('a1' , '2') , ('a1' , '3')]] , ['stop' , [('a2' , 5)]]] , columns=['a' , 'b'])
print(df_post)
a b
0 stop [(a1, 2), (a1, 3)]
1 stop [(a2, 5)]
通配符也是一个停止条件,如果遇到新行,就像之前一样插入df_post。
这是我到目前为止所做的:
df['stop_loc'] = ( (df['a'] == 'stop') | (df['a'] == 'wildcard') ).cumsum()
df_new = df[(df['a'] != 'stop') & (df['stop_loc'] != df['stop_loc'].max())].groupby('stop_loc').apply(lambda x: list(zip(x.a, x.b)))
df_new
呈现:
stop_loc
1 [(a1, 2), (a1, 3)]
2 [(a2, 5)]
dtype: object
'停止'值不作为行插入。如何修改以便生成的数据框是
a b
0 stop [(a1, 2), (a1, 3)]
1 stop [(a2, 5)]
而不是:
stop_loc
1 [(a1, 2), (a1, 3)]
2 [(a2, 5)]
dtype: object
答案 0 :(得分:1)
您正使用df['a'] != 'stop'
过滤停止行。这是一个替代代码:
# df['stop_loc'] = ( (df['a'] == 'stop') | (df['a'] == 'wildcard') ).cumsum()
df['stop_loc'] = df['a'].isin(['stop', 'wildcard']).cumsum()
def zip_entries(x):
return list(x.a)[0], list(zip(x.a[1:], x.b[1:]))
df_new = (df[(df['stop_loc'] != df['stop_loc'].max())]
.groupby('stop_loc')
.apply(zip_entries)
.apply(pd.Series))
print(df_new)
# 0 1
# stop_loc
# 1 stop [(a1, 2), (a1, 3)]
# 2 stop [(a2, 5)]