“是否存在一个熊猫函数,用于基于数据帧的另一列的某些值添加新列?”

时间:2018-12-20 02:58:01

标签: python pandas

我正在尝试根据另一列中的时间值在数据框中创建一个新列,即如果时间在06:00:00和12:00:00之间,则在早上,如果时间在12:0:00和下午15:00:00等等

我已经尝试过使用for循环和if else语句,但是我的数据帧有1549293行,因此循环无法执行

import datetime
import time
times= [datetime.time(6,0,0),datetime.time(12,0,0),datetime.time(15,0,0),datetime.time(20,0,0),datetime.time(23,0,0)]
times

df['time']=df['start_time'].dt.time
df['day_interval']=df['time']

for i in range(0,df.shape[0]):

    if df['time'][i] >= times[0] and df['time'][i] < times[1]:
        df['day_interval'][i]= "Morning"
    elif df['time'][i] >= times[1] and df['time'][i] < times[2]:
        df['day_interval'][i]= "Afternoon"
    elif df['time'][i] >= times[2] and df['time'][i] < times[3]:
        df['day_interval'][i]= "Evening"
    elif df['time'][i] >= times[3] and df['time'][i] < times[4]:
        df['day_interval'][i]= "Night"
    elif df['time'][i] >= times[4]:
        df['day_interval'][i]= "Late Night"
    if df['time'][i] < times[0]:
        df['day_interval'][i]= "Early Hours"

有什么方法可以减少处理时间

4 个答案:

答案 0 :(得分:3)

使用f'{:0>2}'注意,我在您的pd.cut 00:00:00和23:59:59中添加了两次

times

数据设置

pd.cut(s1,bins=pd.to_datetime(pd.Series(times),format='%H:%M:%S').tolist(),labels=['Early','M','A','E','N','L'])
0    Early
1        M
Name: time, dtype: category
Categories (6, object): [Early < M < A < E < N < L]

答案 1 :(得分:1)

行循环几乎不应该在熊猫中使用。熊猫支持矢量化操作:

df.loc[(df['time'] >= times[0]) & (df['time'] < times[1]),
       'day_interval'] = "Morning"
df.loc[(df['time'] >= times[1]) & (df['time'] < times[2]),
       'day_interval'] = "Afternoon"

等等但是使用pd.cut更加优雅-请参阅W-B的解决方案。

答案 2 :(得分:1)

我将它与Sub ForwardEmail(item As Outlook.MailItem) Dim oMail As MailItem On Error GoTo Release If item.Class = olMail Then Set oMail = item.Forward oMail.Subject = oMail.Subject & “suffix” oMail.HTMLBody = "Have a nice day." & vbCrLf & oMail.HTMLBody oMail.Recipients.Add "email address here" oMail.Save oMail.Send End If Release: Set oMail = Nothing Set oExplorer = Nothing End Sub 一起作为选项df.between_time扔出去

loc

答案 3 :(得分:0)

在大熊猫/稀树草原上,大多数时候,如果您要去foorloop,可能会有更好的方法。

不确定是否更快,但是我认为这至少更清洁一点[希望也正确吗?]

def time_of_day(hour):
    if hour < 6:
        return 'Early Hours'
    elif 6 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 15:
        return 'Afternoon'
    elif 15 <= hour < 20:
        return 'Evening'
    elif 20 <= hour < 23:
        return 'Night'
    else:
        return 'Late Night'


def main():
    # ... code that generates df ...
    df['day_interval'] = df['start_time'].dt.hour.map(time_of_day)


if __name__ == '__main__':
    main()