我有一个非常大的数据框。
我想先按“ id”列进行分组。
然后根据其他现有列创建一个新列“ reply_time”。
import pandas as pd
import numpy as np
id = ['793601486525702000','793601486525702000','793601710614802000','793601355214561000','793601355214561000','793601355214561000','793601355214561000','788130215436230000','788130215436230000','788130215436230000','788130215436230000','788130215436230000']
time = ['11/1/2016 16:53','11/1/2016 16:53','11/1/2016 16:52','11/1/2016 16:55','11/1/2016 16:53','11/1/2016 16:53','11/1/2016 16:51','11/1/2016 3:09','11/1/2016 3:04','11/1/2016 2:36','11/1/2016 2:08','11/1/2016 0:28']
reply = ['3','3','0','3','3','2','1','3','2','3','3','1']
df = pd.DataFrame({"id": id, "time": time, "reply": reply})
id time reply
793601486525702000 11/1/2016 16:53 3
793601486525702000 11/1/2016 16:53 3
793601710614802000 11/1/2016 16:52 0
793601355214561000 11/1/2016 16:55 3
793601355214561000 11/1/2016 16:53 3
793601355214561000 11/1/2016 16:53 2
793601355214561000 11/1/2016 16:51 1
788130215436230000 11/1/2016 3:09 3
788130215436230000 11/1/2016 3:04 2
788130215436230000 11/1/2016 2:36 3
788130215436230000 11/1/2016 2:08 3
788130215436230000 11/1/2016 0:28 1
此新列“ reply_time”中有两种类型的值。
在这种情况下,我的输出数据帧将是:
id time reply reply_time
793601486525702000 11/1/2016 16:53 3 na
793601486525702000 11/1/2016 16:53 3 na
793601710614802000 11/1/2016 16:52 0 na
793601355214561000 11/1/2016 16:55 3 na
793601355214561000 11/1/2016 16:53 3 na
793601355214561000 11/1/2016 16:53 2 na
793601355214561000 11/1/2016 16:51 1 11/1/2016 16:53
788130215436230000 11/1/2016 3:09 3 na
788130215436230000 11/1/2016 3:04 2 na
788130215436230000 11/1/2016 2:36 3 na
788130215436230000 11/1/2016 2:08 3 na
788130215436230000 11/1/2016 0:28 1 11/1/2016 3:04
我不知道实现此目标的最佳方法。有人可以帮忙吗?
谢谢!
答案 0 :(得分:0)
在切片和merge
之后尝试replace
yourdf=df.merge(df.query("reply=='2'").replace({'reply':{'2':'1'}}).rename(columns={'time':'reply_time'}),how='left')