我正在尝试从我拥有的CSV文件中添加时间/持续时间值,但到目前为止我已经失败了。这是我试图添加的示例csv。
是否可以获得此输出?
输出:
我一直在尝试将日期时间加起来,但我总是失败:
finput = open("./Test.csv", "r")
while 1:
line = finput.readline()
if not line:
break
else:
user = line.split(delim)[0]
direction = line.split(delim)[1]
duration = line.split(delim)[2]
durationz = 0:00:00
for k in duration:
durationz += k
此外: 是否有一种特定的方式来声明时间值?
答案 0 :(得分:2)
使用datetime.timedelta()
个对象对持续时间进行建模,并以秒,分和小时的形式传递3个组件。
使用csv
module解析您的文件;没有必要在这里重新发明字符分隔值解析轮。
使用字典跟踪每个用户的 In 和 Out 值;使用collections.defaultdict()
object可以更轻松地添加新用户:
from collections import defaultdict
from datetime import timedelta
import csv
durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})
with open("./Test.csv", "rb") as inf:
reader = csv.reader(inf, delimiter=delim)
for name, direction, duration in reader:
hours, minutes, seconds = map(int, duration.split(':'))
duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
durations[name][direction] += duration
for name, directions in durations.items():
print '{:10} In {}'.format(name, directions['In'])
print ' Out {}'.format(directions['Out'])
print ' Total {}'.format(
directions['In'] + directions['Out'])
timedelta()
个对象在转换回字符串时(例如打印或使用str.format()
格式化时)会再次转换为h:mm:ss
格式。
演示:
>>> import csv
>>> from collections import defaultdict
>>> from datetime import timedelta
>>> sample = '''\
... Johnny,In,0:02:36
... Kate,Out,0:02:15
... Paul,In,0:03:57
... Chris,In,0:01:26
... Jonathan,In,0:00:37
... Kyle,In,0:06:46
... Armand,Out,0:00:22
... Ryan,In,0:00:51
... Jonathan,Out,0:12:19
... '''.splitlines()
>>> durations = defaultdict(lambda: {'In': timedelta(), 'Out': timedelta()})
>>> reader = csv.reader(sample)
>>> for name, direction, duration in reader:
... hours, minutes, seconds = map(int, duration.split(':'))
... duration = timedelta(hours=hours, minutes=minutes, seconds=seconds)
... durations[name][direction] += duration
...
>>> for name, directions in durations.items():
... print '{:10} In {}'.format(name, directions['In'])
... print ' Out {}'.format(directions['Out'])
... print ' Total {}'.format(
... directions['In'] + directions['Out'])
...
Johnny In 0:02:36
Out 0:00:00
Total 0:02:36
Kyle In 0:06:46
Out 0:00:00
Total 0:06:46
Ryan In 0:00:51
Out 0:00:00
Total 0:00:51
Chris In 0:01:26
Out 0:00:00
Total 0:01:26
Paul In 0:03:57
Out 0:00:00
Total 0:03:57
Jonathan In 0:00:37
Out 0:12:19
Total 0:12:56
Kate In 0:00:00
Out 0:02:15
Total 0:02:15
Armand In 0:00:00
Out 0:00:22
Total 0:00:22
答案 1 :(得分:1)
首先,您可能会发现python的内置csv模块很有帮助。您无需手动拆分线路和分配数据,而只需执行以下操作:
import csv
with open("test.csv", mode="r") as f:
reader = csv.reader(f)
for row in reader:
user, direction, duration = row # this is equivalent to your own variable assignment code,
# using a cool feature of python called tuple unpacking
字典是用户对数据进行分组的好方法。这可能是这样的:
...
user_dict = {}
for row in reader:
user, direction, duration = row
user_dict[user] = user_dict.get(user, default={"in": "0:00:00", "out": "0:00:00"})
user_dict[user][direction] = duration
一旦它贯穿整个输入csv,你应该有一个包含每个用户条目的字典,每个用户条目都包含它们各自的“in”和“out”值。如果它们在csv中缺少in或out值,则通过使用dictionary.get()语句的“default”参数将其设置为“0:00:00”。
我们可以手动解析时间,但处理时间加入我们将是一个巨大的痛苦。幸运的是,python有一个用于处理时间的内置模块,称为datetime。
import csv
import datetime
user_dict = {}
with open("test.csv", mode="r") as f:
reader = csv.reader(f)
for row in reader:
user, direction, duration = row
hour, minute, second = duration.split(":")
# since the split left us with strings, and datetime needs integers, we'll need to cast everything to an int.
hour = int(hour)
minute = int(minute)
second = int(second)
# (we could have done the above more concisely using a list comprehension, which would look like this:
# hour, minute, second = [int(time) for time in duration.split(":")]
# to add time values we'll use the timedelta function in datetime, which takes days then seconds as its arguments.
# We'll just use seconds, so we'll need to convert the hours and minutes first.
seconds = second + minute*60 + hour*60*60
duration = datetime.timedelta(0, seconds)
user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
user_dict[user][direction] = duration
看看你的例子,我们只是将时间加到时间上(虽然如果我们想要时间总时间,我们希望从时间中减去时间)。我们可以使用以下内容添加部分:
output = []
for user, time_dict in user_dict.items():
total = time_dict["in"] + time_dict["out"]
output.append([user, time_dict["in"], time_dict["out"], total])
with open("output.csv", mode="w") as f:
writer = csv.writer(f)
writer.writerows(output)
这应该让你接近你想要的东西,虽然输出对每个用户来说都是一行 - 数据将水平而不是垂直显示。
所有代码在一起:
import csv
import datetime
user_dict = {}
with open("test.csv", mode="r") as f:
reader = csv.reader(f)
for row in reader:
user, direction, duration = row
hour, minute, second = [int(time) for time in duration.split(":")]
seconds = second + minute*60 + hour*60*60
duration = datetime.timedelta(0, seconds)
user_dict[user] = user_dict.get(user, default={"in": datetime.timedelta(0,0), "out": datetime.timedelta(0,0)})
user_dict[user][direction] = duration
output = []
for user, time_dict in user_dict.items():
total = time_dict["in"] + time_dict["out"]
output.append([user, time_dict["in"], time_dict["out"], total])
with open("output.csv", mode="w") as f:
writer = csv.writer(f)
header = ["name", "time in", "time out", "total time"]
writer.writerow(header)
writer.writerows(output)
答案 2 :(得分:0)
您可以解决一些问题。
首先,您可以通过执行for line in file
来阅读文件中的每一行。
您不能将变量durationz声明为0:00:00。它只是在python中不起作用。
你可以做的一件事是使持续时间为0,并通过将其转换为秒数来解析时间。一些伪代码:
split duration string by ":"
add 60 * 60 * hours to duration
add 60 * minutes to duration
add seconds to duration