熊猫:在包含2行的concat字符串之后删除行

时间:2020-02-27 14:30:05

标签: python pandas for-loop

我喜欢检查一列,如果该列的日期与下一个相同,则合并备注列。日期行可能超过2个。

我目前的代码停留在这个阶段:

df = {'date': ['02-Jan','02-Jan','03-Jan','03-Jan','03-Jan','04-Jan','05-Jan'],
       'remarks':['a','b','c','d','e','f','g']}
df = pd.DataFrame(df)
for eachRow in range(len(df)):
    print("row" , eachRow)
    try:
        if(df['date'][eachRow] == df['date'][eachRow + 1]):
            df['remarks'][eachRow] = df['remarks'][eachRow] + df['remarks'][eachRow + 1]
            print('drop', eachRow+1)
            df = df.drop(eachRow + 1) 
            print(df)
    except:
        print(df)

我的输出电流是。我注意到当我有两个以上具有相同日期的连续行,并且当我删除第3行时,我无法检查第2行和第4行,因为我的eachRow指针已移至第3行,并且第3行没有可比较的内容。如果我选择不删除下一行,则重复的行将带有错误的注释。我该怎么办?

row 0
drop 1
     date remarks
0  02-Jan      ab
2  03-Jan       c
3  03-Jan       d
4  03-Jan       e
5  04-Jan       f
6  05-Jan       g
row 1
     date remarks
0  02-Jan      ab
2  03-Jan       c
3  03-Jan       d
4  03-Jan       e
5  04-Jan       f
6  05-Jan       g
row 2
drop 3
     date remarks
0  02-Jan      ab
2  03-Jan      cd
4  03-Jan       e
5  04-Jan       f
6  05-Jan       g
row 3
     date remarks
0  02-Jan      ab
2  03-Jan      cd
4  03-Jan       e
5  04-Jan       f
6  05-Jan       g
row 4
row 5
row 6
     date remarks
0  02-Jan      ab
2  03-Jan      cd
4  03-Jan       e
5  04-Jan       f
6  05-Jan       g

1 个答案:

答案 0 :(得分:1)

一个简单的更改即可保存它:

而不是删除下一行(eachRow+1),而是删除当前行(eachRow):

df = df.drop(eachRow);

同时,您必须注意,在删除当前行时,必须在下一行进行串联。因此,将行更改为:

df['remarks'][eachRow+1] = df['remarks'][eachRow] + df['remarks'][eachRow + 1]

df = {'date': ['02-Jan','02-Jan','03-Jan','03-Jan','03-Jan','04-Jan','05-Jan'],
       'remarks':['a','b','c','d','e','f','g']}
df = pd.DataFrame(df)
for eachRow in range(len(df)):
    print("row" , eachRow)
    try:
        if(df['date'][eachRow] == df['date'][eachRow + 1]):
            df['remarks'][eachRow+1] = df['remarks'][eachRow] + df['remarks'][eachRow + 1]
            print('drop', eachRow)
            df = df.drop(eachRow)
            print(df)
    except:
        print(df)