我有一个200行的DataFrame df。前几行如下所示:
+--------------+----------+----------------+
|Customer name | Arrival | Actual Arrival |
+--------------+----------+----------------+
| Customer 21 | 20 | |
| Customer 22 | 30 | |
| Customer 23 | 20 | |
| Customer 24 | 10 | |
| Customer 25 | 20 | |
+--------------+----------+----------------+
“到达”列显示从开始时间开始的分钟到达时间。因此,我想通过向“到达”列值添加固定的时间戳记(“开始时间”)来创建“实际到达”列。 例如:如果开始时间为07:00;
+--------------+----------+----------------+
|Customer name | Arrival | Actual Arrival |
+--------------+----------+----------------+
| Customer 21 | 20 | 07:20 |
| Customer 22 | 30 | 07:30 |
| Customer 23 | 20 | 07:20 |
| Customer 24 | 10 | 07:10 |
| Customer 25 | 60 | 08:00 |
+--------------+----------+----------------+
如何在python中做到这一点?
答案 0 :(得分:1)
您可以尝试以下方法吗?
import datetime
actual_start_time = datetime.time(7, 0)
start_time = datetime.datetime.combine(datetime.datetime.today().date(), actual_start_time)
df['Actual Arrival'] = df['Arrival'].apply(lambda x: start_time + datetime.timedelta(minutes=x))
示例:
>>> df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]]).T
>>> df.columns = ['name', 'Arrival']
>>> df
name Arrival
0 1 5
1 2 6
2 3 7
3 4 8
>>> start_time = datetime.datetime.strptime('2019-05-17 7:0:0', '%Y-%m-%d %H:%M:%S')
>>> df['Actual Arrival'] = df['Arrival'].apply(lambda x: start_time + datetime.timedelta(minutes=x))
>>> df
name Arrival Actual Arrival
0 1 5 2019-05-17 07:05:00
1 2 6 2019-05-17 07:06:00
2 3 7 2019-05-17 07:07:00
3 4 8 2019-05-17 07:08:00
如果只想要时间,可以执行以下操作:
>>> df['Actual Arrival'] = df['Arrival'].apply(lambda x: (start_time + datetime.timedelta(minutes=x)).time())
>>> df
name Arrival Actual Arrival
0 1 5 07:05:00
1 2 6 07:06:00
2 3 7 07:07:00
3 4 8 07:08:00
答案 1 :(得分:0)
如果您只想使用time()
作为基准时间,请尝试以下操作:
import datetime
arrival = 20
base = datetime.time(7, 0, 0) # 07:00:00 or whatever your base time is.
actual_arrival = datetime.time(base.hour, base.minute + arrival, base.second)
这将输出:
07:20:00
编辑:请注意,在上述方法中,time()
的参数必须有效,并且如果您的分钟数大于59,则该分钟数将无效。
解决方法是,您可以定义一个辅助函数来实现此目的:
def add_times(op1, op2): # two tuples in this format: (h, m, s)
hour = op1[0] + op2[0]
minute = op1[1] + op2[1]
second = op1[2] + op2[2]
if second >= 60:
minute += 1
second -= 60
if minute >= 60:
hour += 1
minute -= 60
if hour >= 24:
hour -= 24
return (hour, minute, second)
并将代码更改为:
base_time = (7, 0, 0) # (hour, minute, seconds)
base = datetime.time(*base_time) # If you want to have it as a time object in your code. It is redundant now.
actual_arrival = datetime.time(*add_times(base_time, (0, 20, 0)))
但是我认为它不再是解决问题的datetime
方法。
答案 2 :(得分:0)
import datetime
data = {'customers': {0: 'Customer 21',
1: 'Customer 22',
2: 'Customer 23',
3: 'Customer 24',
4: 'Customer 25'},
'arrival': {0: 20, 1: 30, 2: 20, 3: 10, 4: 60}}
df = pd.DataFrame(data)
df['actual arrival'] = df.arrival.apply(lambda x:(pd.to_datetime(datetime.datetime.today().date())+pd.offsets.Timedelta(hours=7,minutes=x)).strftime('%H:%M'))