Question

所以对于像这样的数据：

01:58:30| USER INPUT : "Hello " 
01:58:30| SYSTEM RESPONSE: "Hello. How are you" 
01:58:56| USER INPUT : "Good thank you. How about you?" 
01:58:57| SYSTEM RESPONSE: "I am doing great!" 
01:59:13| USER INPUT : "Thats it" 
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"

我想减去每行响应所花费的时间例如：

01:58:30| USER INPUT : "Hello " 
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you" 
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?" 
<1 seconds>
01:58:57| SYSTEM RESPONSE: "I am doing great!" 
<16 seconds>
01:59:13| USER INPUT : "Thats it" 
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"

到目前为止，我知道如何计算时差：

from datetime import datetime
s1 = '01:59:13'
s2 = 01:59:15' # for example
format = '%H:%M:%S'
time = datetime.strptime(s2, format) - datetime.strptime(s1, format)
print time

我可以使用任何建议来获得一种阅读台词的方式。请随时询问我更多的澄清信息！

Answer 1

您可以使用re模块提取时间数据。我写了一个简单的生成器，它接受字符串输入并输出所有行以及它们之间的时间间隔：

string_input = """
01:58:30| USER INPUT : "Hello "
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
01:58:56| USER INPUT : "Good thank you. How about you?"
01:58:57| SYSTEM RESPONSE: "I am doing great!"
01:59:13| USER INPUT : "Thats it"
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"
"""

import re
from datetime import datetime

def get_time(data):
    groups = re.findall(r'(([\d:]+)\|.*)', string_input)
    time_format = '%H:%M:%S'

    t1, t2 = None, None
    for (line1, time1), (line2, time2) in zip(groups, groups[1::1]):
        time1 = datetime.strptime(time1, time_format)
        time2 = datetime.strptime(time2, time_format)
        total_time = int((time2 - time1).total_seconds())
        singular_or_plural = 'second' if total_time == 1 else 'seconds'
        yield f'{line1}\n<{total_time} {singular_or_plural}>'
    yield f'{line2}'

for line in get_time(string_input):
    print(line)

输出为：

01:58:30| USER INPUT : "Hello "
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?"
<1 second>
01:58:57| SYSTEM RESPONSE: "I am doing great!"
<16 seconds>
01:59:13| USER INPUT : "Thats it"
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"
<41413 seconds>
13:29:28| USER INPUT : "Deal"

Answer 2

假设“ USER INPUT”行始终紧随其后的是“ SYSTEM RESPONSE”行，那么这是一个基于熊猫的解决方案：

首先，从文件中读取数据：

import pandas as pd
df = pd.read_csv("youf_file_name", sep=r'\s?[|:]\s+',\
                 header=None, parse_dates=[0])

将日期列上移并从其自身中减去（以获得行与行之间的差异； NaT并非一次）：

df['diff'] = df[0].shift(-1) - df[0]

删除“日期”部分：

df[0] = df[0].dt.time
#       0                1                                 2     diff
#01:58:30       USER INPUT                          "Hello " 00:00:00
#01:58:30  SYSTEM RESPONSE              "Hello. How are you" 00:00:26
#01:58:56       USER INPUT  "Good thank you. How about you?" 00:00:01
#01:58:57  SYSTEM RESPONSE               "I am doing great!" 00:00:16
#01:59:13       USER INPUT                        "Thats it" 00:00:02
#01:59:15  SYSTEM RESPONSE                            "Deal" 11:30:13
#13:29:28       USER INPUT                            "Deal"      NaT

作为奖励，您可以获得交互之间的时间。

如何从行中减去数字以获得时差

2 个答案: