所以对于像这样的数据:
01:58:30| USER INPUT : "Hello "
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
01:58:56| USER INPUT : "Good thank you. How about you?"
01:58:57| SYSTEM RESPONSE: "I am doing great!"
01:59:13| USER INPUT : "Thats it"
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"
我想减去每行响应所花费的时间 例如:
01:58:30| USER INPUT : "Hello "
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?"
<1 seconds>
01:58:57| SYSTEM RESPONSE: "I am doing great!"
<16 seconds>
01:59:13| USER INPUT : "Thats it"
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"
到目前为止,
我知道如何计算时差:
from datetime import datetime
s1 = '01:59:13'
s2 = 01:59:15' # for example
format = '%H:%M:%S'
time = datetime.strptime(s2, format) - datetime.strptime(s1, format)
print time
我可以使用任何建议来获得一种阅读台词的方式。 请随时询问我更多的澄清信息!
答案 0 :(得分:2)
您可以使用re
模块提取时间数据。我写了一个简单的生成器,它接受字符串输入并输出所有行以及它们之间的时间间隔:
string_input = """
01:58:30| USER INPUT : "Hello "
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
01:58:56| USER INPUT : "Good thank you. How about you?"
01:58:57| SYSTEM RESPONSE: "I am doing great!"
01:59:13| USER INPUT : "Thats it"
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"
"""
import re
from datetime import datetime
def get_time(data):
groups = re.findall(r'(([\d:]+)\|.*)', string_input)
time_format = '%H:%M:%S'
t1, t2 = None, None
for (line1, time1), (line2, time2) in zip(groups, groups[1::1]):
time1 = datetime.strptime(time1, time_format)
time2 = datetime.strptime(time2, time_format)
total_time = int((time2 - time1).total_seconds())
singular_or_plural = 'second' if total_time == 1 else 'seconds'
yield f'{line1}\n<{total_time} {singular_or_plural}>'
yield f'{line2}'
for line in get_time(string_input):
print(line)
输出为:
01:58:30| USER INPUT : "Hello "
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?"
<1 second>
01:58:57| SYSTEM RESPONSE: "I am doing great!"
<16 seconds>
01:59:13| USER INPUT : "Thats it"
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"
<41413 seconds>
13:29:28| USER INPUT : "Deal"
答案 1 :(得分:1)
假设“ USER INPUT”行始终紧随其后的是“ SYSTEM RESPONSE”行,那么这是一个基于熊猫的解决方案:
首先,从文件中读取数据:
import pandas as pd
df = pd.read_csv("youf_file_name", sep=r'\s?[|:]\s+',\
header=None, parse_dates=[0])
将日期列上移并从其自身中减去(以获得行与行之间的差异; NaT并非一次):
df['diff'] = df[0].shift(-1) - df[0]
删除“日期”部分:
df[0] = df[0].dt.time
# 0 1 2 diff
#01:58:30 USER INPUT "Hello " 00:00:00
#01:58:30 SYSTEM RESPONSE "Hello. How are you" 00:00:26
#01:58:56 USER INPUT "Good thank you. How about you?" 00:00:01
#01:58:57 SYSTEM RESPONSE "I am doing great!" 00:00:16
#01:59:13 USER INPUT "Thats it" 00:00:02
#01:59:15 SYSTEM RESPONSE "Deal" 11:30:13
#13:29:28 USER INPUT "Deal" NaT
作为奖励,您可以获得交互之间的时间。