从CSV文件

时间:2015-06-15 18:30:53

标签: python python-2.7 csv datetime

我正在尝试从我拥有的CSV文件中添加时间/持续时间值,但到目前为止我已经失败了。这是我试图添加的示例csv。

enter image description here

是否可以获得此输出?

输出: enter image description here

我一直在尝试将日期时间加起来,但我总是失败:

finput = open("./Test.csv", "r")
while 1:
  line = finput.readline()
  if not line:
    break
  else:
    user = line.split(delim)[0]
    direction = line.split(delim)[1]
    duration = line.split(delim)[2]

    durationz = 0:00:00
    for k in duration:
      durationz += k

此外:       是否有一种特定的方式来声明时间值?

3 个答案:

答案 0 :(得分:2)

使用datetime.timedelta()个对象对持续时间进行建模,并以秒,分和小时的形式传递3个组件。

使用csv module解析您的文件;没有必要在这里重新发明字符分隔值解析轮。

使用字典跟踪每个用户的 In Out 值;使用collections.defaultdict() object可以更轻松地添加新用户:

from collections import defaultdict
from datetime import timedelta
import csv

durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})

with open("./Test.csv", "rb") as inf:
    reader = csv.reader(inf, delimiter=delim)
    for name, direction, duration in reader:
        hours, minutes, seconds = map(int, duration.split(':'))
        duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
        durations[name][direction] += duration

for name, directions in durations.items():
    print '{:10} In    {}'.format(name, directions['In'])
    print '           Out   {}'.format(directions['Out'])
    print '           Total {}'.format(
        directions['In'] + directions['Out'])

timedelta()个对象在转换回字符串时(例如打印或使用str.format()格式化时)会再次转换为h:mm:ss格式。

演示:

>>> import csv
>>> from collections import defaultdict
>>> from datetime import timedelta
>>> sample = '''\
... Johnny,In,0:02:36
... Kate,Out,0:02:15
... Paul,In,0:03:57
... Chris,In,0:01:26
... Jonathan,In,0:00:37
... Kyle,In,0:06:46
... Armand,Out,0:00:22
... Ryan,In,0:00:51
... Jonathan,Out,0:12:19
... '''.splitlines()
>>> durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})
>>> reader = csv.reader(sample)
>>> for name, direction, duration in reader:
...     hours, minutes, seconds = map(int, duration.split(':'))
...     duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
...     durations[name][direction] += duration
... 
>>> for name, directions in durations.items():
...     print '{:10} In    {}'.format(name, directions['In'])
...     print '           Out   {}'.format(directions['Out'])
...     print '           Total {}'.format(
...         directions['In'] + directions['Out'])
... 
Johnny     In    0:02:36
           Out   0:00:00
           Total 0:02:36
Kyle       In    0:06:46
           Out   0:00:00
           Total 0:06:46
Ryan       In    0:00:51
           Out   0:00:00
           Total 0:00:51
Chris      In    0:01:26
           Out   0:00:00
           Total 0:01:26
Paul       In    0:03:57
           Out   0:00:00
           Total 0:03:57
Jonathan   In    0:00:37
           Out   0:12:19
           Total 0:12:56
Kate       In    0:00:00
           Out   0:02:15
           Total 0:02:15
Armand     In    0:00:00
           Out   0:00:22
           Total 0:00:22

答案 1 :(得分:1)

首先,您可能会发现python的内置csv模块很有帮助。您无需手动拆分线路和分配数据,而只需执行以下操作:

import csv
with open("test.csv", mode="r") as f:
    reader = csv.reader(f)
    for row in reader:
        user, direction, duration = row  # this is equivalent to your own variable assignment code, 
                                         # using a cool feature of python called tuple unpacking

字典是用户对数据进行分组的好方法。这可能是这样的:

...
user_dict = {}
for row in reader:
    user, direction, duration = row
    user_dict[user] = user_dict.get(user, default={"in": "0:00:00", "out": "0:00:00"})
    user_dict[user][direction] = duration

一旦它贯穿整个输入csv,你应该有一个包含每个用户条目的字典,每个用户条目都包含它们各自的“in”和“out”值。如果它们在csv中缺少in或out值,则通过使用dictionary.get()语句的“default”参数将其设置为“0:00:00”。

我们可以手动解析时间,但处理时间加入我们将是一个巨大的痛苦。幸运的是,python有一个用于处理时间的内置模块,称为datetime。

import csv
import datetime

user_dict = {}
with open("test.csv", mode="r") as f:
    reader = csv.reader(f)
    for row in reader:
        user, direction, duration = row
        hour, minute, second = duration.split(":")

        # since the split left us with strings, and datetime needs integers, we'll need to cast everything to an int.
        hour = int(hour)
        minute = int(minute)
        second = int(second)

        # (we could have done the above more concisely using a list comprehension, which would look like this:
        # hour, minute, second = [int(time) for time in duration.split(":")]

        # to add time values we'll use the timedelta function in datetime, which takes days then seconds as its arguments. 
        # We'll just use seconds, so we'll need to convert the hours and minutes first.
        seconds = second + minute*60 + hour*60*60

        duration = datetime.timedelta(0, seconds)

        user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
        user_dict[user][direction] = duration

看看你的例子,我们只是将时间加到时间上(虽然如果我们想要时间总时间,我们希望从时间中减去时间)。我们可以使用以下内容添加部分:

output = []
for user, time_dict in user_dict.items():
    total = time_dict["in"] + time_dict["out"]
    output.append([user, time_dict["in"], time_dict["out"], total])

with open("output.csv", mode="w") as f:
    writer = csv.writer(f)
    writer.writerows(output)

这应该让你接近你想要的东西,虽然输出对每个用户来说都是一行 - 数据将水平而不是垂直显示。

所有代码在一起:

import csv
import datetime

user_dict = {}
with open("test.csv", mode="r") as f:
    reader = csv.reader(f)
    for row in reader:
        user, direction, duration = row
        hour, minute, second = [int(time) for time in duration.split(":")]
        seconds = second + minute*60 + hour*60*60
        duration = datetime.timedelta(0, seconds)

        user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
        user_dict[user][direction] = duration

output = []
for user, time_dict in user_dict.items():
    total = time_dict["in"] + time_dict["out"]
    output.append([user, time_dict["in"], time_dict["out"], total])

with open("output.csv", mode="w") as f:
    writer = csv.writer(f)
    header = ["name", "time in", "time out", "total time"]
    writer.writerow(header)
    writer.writerows(output)

答案 2 :(得分:0)

您可以解决一些问题。

首先,您可以通过执行for line in file来阅读文件中的每一行。

您不能将变量durationz声明为0:00:00。它只是在python中不起作用。

你可以做的一件事是使持续时间为0,并通过将其转换为秒数来解析时间。一些伪代码:

split duration string by ":"
add 60 * 60 * hours to duration
add 60 * minutes to duration
add seconds to duration