我想编写一个将时间分为多个时间段的代码。我有两列from
和to
,还有列表periods
。基于两列中的值,我需要将新列插入表示时间段的名为periods
的数据框。
这是代码:
import pandas as pd
df = pd.DataFrame({"from":['08:10', '14:00', '15:00', '17:01', '13:41'],
"to":['10:11', '15:32', '15:35' , '18:23', '16:16']})
print(df)
periods = ["00:01-06:00", "06:01-12:00", "12:01-18:00", "18:01-00:00"]
#if times are between two periods, for example '17:01' and '18:23', it counts as first period ("12:01-18:00")
结果应如下所示:
from to period
0 08:10 10:11 06:01-12:00
1 14:00 15:32 12:01-18:00
2 15:00 15:35 12:01-18:00
3 17:01 18:03 18:01-00:00
4 18:41 19:16 18:01-00:00
两列中的值为日期时间。
答案 0 :(得分:1)
这是一种方法(我假设“ 18:00”属于“ 12:01-18:00”):
results = [0 for x in range(len(df))]
for row in df.iterrows():
item = row[1]
start = item['from']
end = item['to']
for ind, period in enumerate(periods):
per_1, per_2 = period.split("-")
if start.split(":")[0] >= per_1.split(":")[0]: #hours
if start.split(":")[0] == per_1.split(":")[0]:
if start.split(":")[1] >= per_1.split(":")[1]: #minutes
if start.split(":")[1] == per_1.split(":")[1]:
results[row[0]] = period
break
#Wrap around if you reach the end of the list
index = ind+1 if ind<len(periods) else 0
results[row[0]] = periods[index]
break
index = ind-1 if ind>0 else len(periods)-1
results[row[0]] = periods[index]
break
if start.split(":")[0] <= per_2.split(":")[0]:
if start.split(":")[0] == per_2.split(":")[0]:
if start.split(":")[1] == per_2.split(":")[1]:
results[row[0]] = period
break
#If anything else, then its greater, so in next period
index = ind+1 if ind<len(periods) else 0
results[row[0]] = periods[index]
break
results[row[0]] = period
break
print(results)
df['periods'] = results
['06:01-12:00', '12:01-18:00', '12:01-18:00', '12:01-18:00', '18:01-00:00']
df['periods'] = results
df
from to periods
0 08:10 10:11 06:01-12:00
1 14:00 15:32 12:01-18:00
2 15:00 15:35 12:01-18:00
3 17:01 18:23 12:01-18:00
4 18:41 16:16 18:01-00:00
这应该涵盖所有情况。但是您应该在所有可能的情况下进行测试,以确保结果。
答案 1 :(得分:0)
下面
console.log(this.package.map(pack => pack.values.filter(
(value, index , array) => array.itemP.indexOf(value) === index
)))
输出
import pandas as pd
from datetime import datetime
df = pd.DataFrame({"from": ['08:10', '14:00', '15:00', '17:01', '13:41'],
"to": ['10:11', '15:32', '15:35', '18:23', '16:16']})
print(df)
periods = ["00:01-06:00", "06:01-12:00", "12:01-18:00", "18:01-00:00"]
_periods = [(datetime.strptime(p.split('-')[0], '%H:%M').time(), datetime.strptime(p.split('-')[1], '%H:%M').time()) for
p in periods]
def match_row_to_period(row):
from_time = datetime.strptime(row['from'], '%H:%M').time()
to_time = datetime.strptime(row['to'], '%H:%M').time()
for idx, p in enumerate(_periods):
if from_time >= p[0] and to_time <= p[1]:
return periods[idx]
for idx, p in enumerate(_periods):
if idx > 0:
prev_p = _periods[idx - 1]
if from_time <= prev_p[1] and to_time >= p[0]:
return periods[idx - 1]
df['period'] = df.apply(lambda row: match_row_to_period(row), axis=1)
print('-----------------------------------')
print('periods: ')
for _p in _periods:
print(str(_p[0]) + ' -- ' + str(_p[1]))
print('-----------------------------------')
print(df)
答案 2 :(得分:0)
不确定是否有更好的解决方案,但是这是一种使用apply和assign pandas方法的方法,通常,这种方法比迭代DataFrame更具Python性,因为pandas已针对完整df ix进行了优化。分配操作,而不是逐行更新(请参见这个很棒的博客post)。
请注意,我在这里使用的数据类型是datetime.time
实例,而不是示例中的字符串。在处理时间时,最好使用适当的时间库而不是字符串表示形式。
from datetime import time
df = pd.DataFrame({
"from": [
time(8, 10),
time(14, 00),
time(15, 00),
time(17, 1),
time(13, 41)
],
"to": [
time(10, 11),
time(15, 32),
time(15, 35),
time(18, 23),
time(16, 16)
]
})
periods = [{
'from': time(00, 1),
'to': time(6, 00),
'period': '00:01-06:00'
}, {
'from': time(6, 1),
'to': time(12, 00),
'period': '06:01-12:00'
}, {
'from': time(12, 1),
'to': time(18, 00),
'period': '12:01-18:00'
}, {
'from': time(18, 1),
'to': time(0, 00),
'period': '18:01-00:00'
}]
def find_period(row, periods):
"""Map the df row to the period which it fits between"""
for ix, period in enumerate(periods):
if row['to'] <= periods[ix]['to']:
if row['from'] >= periods[ix]['from']:
return periods[ix]['period']
# Use df assign to assign the new column to the df
df.assign(
**{
'period':
df.apply(lambda row: find_period(row, periods), axis='columns')
}
)
Out:
from to period
0 08:10:00 10:11:00 06:01-12:00
1 14:00:00 15:32:00 12:01-18:00
2 15:00:00 15:35:00 12:01-18:00
3 17:01:00 18:23:00 None
4 13:41:00 16:16:00 12:01-18:00
ix 3的行正确显示了None
,因为它不能准确地适合您定义的两个周期中的任何一个(相反,它桥接了12:00-18:00
和18:00-00:00
)