所以,我想知道如何用条件划分一列。我的想法是研究用户的活动,但是为此,我需要提出一个条件。 我有数据框:
df = pd.DataFrame({'User': ["juan","juan","juan","juan","petter","petter","petter","petter","petter","petter","petter","petter","ana","ana","ana","ana","raul","raul","raul","raul"],
'time': ["2/1/2019","3/1/2019","4/1/2019","6/1/2019","2/1/2019","5/1/2019","6/1/2019","10/1/2019","11/1/2019","12/1/2019","13/1/2019","14/1/2019","8/1/2019","10/1/2019","15/1/2019","20/1/2019","15/1/2019","17/1/2019","18/1/2019","19/1/2019"],
'activity': ["fly", "hotel","car","jump","fly", "hotel","jump","car","fly", "car","hotel","car","car", "hotl","car","hotel","fly", "hotel","car","car"],
'%timeper_user': ["4 days","4 days","4 days","4 days","8 days","8 days","8 days","8 days","3 days","3 days","3 days","3 days","12 days","12 days","12 days","12 days","4 days","4 days","4 days","4 days"]})
如您所见,每个用户都有一个列(时间),每个用户有另一个列(%timeper_user)。然后是一列(活动),即每个用户在一段时间内执行的活动。这个想法是将每个活动放在不同的列中进行“条件拆分”。行为1,行为2,行为3,行为3。但是,当用户执行时间以外的活动(时间+ timeper_user的百分比)时,请将活动放在不同的列中,例如:Act21,Act 22,Act 23,Act24我希望这样:
df2 = pd.DataFrame({'User': ["juan","petter","ana","raul"],
"act1":["fly","fly","car","fly"],
"act2":["hotel","hotel","hotel","hotel"],
"act3":["car","jump","car","car"],
"act4":["jump","car","hotel","car"],
"actn":["","","",""],
"act21":["","fly","",""],
"act22":["","car","",""],
"act23":["","hotel","",""],
"act24":["","car","",""]})
(DF2)是我想要的输出 查看用户Petter超过时间(2/1/2019 + 8天)= 10/1/2019。因此,从11/1/2019起,活动将放置在Act21,Act22,Act23,Act24中。 我有很多用户,所以我不知道如何执行执行此操作并接受全部操作的功能(逐个用户)。如果您能帮助我,我将非常感谢。谢谢
答案 0 :(得分:0)
这个想法是。如果用户在范围(时间+每个用户的时间百分比)之间进行事件,则表示所有活动都属于内部,请输入活动范围1(act1,Act12,Act13,Act 14)。如果日期较大,则表示用户将输入活动2(act21,act22,act23,act24)。简单来说...如果彼得从美国去马德里,他可能会去旅馆,租车,然后试乘飞机。但是当他返回美国时,在附近,佩特可能会购买第二次飞行(这将进入范围活动2(动作21,act22,Act23,act24)。如果您运行df ...是我拥有的数据框。 。和df2是我要制作的数据