我有一只像这样的熊猫:
df1 = pd.DataFrame({"strings":pd.Series(["a very ", "very boring text", " I cannot read it", "Hi everyone"," please go home ","or I will go ","now"]),
"changetype":pd.Series([0,0,-1,0,1,1,1])})
每次行中的changetype == row + 1,我想要连接字符串。因此,最终的df应如下所示:
df2 = pd.DataFrame({"strings":pd.Series(["a very very boring text", " I cannot read it", "Hi everyone"," please go home or I will go now"]),
"changetype":pd.Series([0,-1,0,1,])})
我是这样开始的,但不知道如何继续:
for row, rowplus in zip(df1.changetype, df1.changetype[1:]):
if row == rowplus:
# concat rows here ...
答案 0 :(得分:1)
使用汇总i128
和Series
的帮助first
groupby
:
join
<强>解释强>:
将shift
d列与ne
s = df1['changetype'].ne(df1['changetype'].shift()).cumsum()
df3 = df1.groupby(s).agg({'strings':' '.join, 'changetype':'first'}).reset_index(drop=True)
print (df3)
strings changetype
0 a very very boring text 0
1 I cannot read it -1
2 Hi everyone 0
3 please go home or I will go now 1
进行比较,并为连续组!=
添加cumsum
:
Series