Python remove angle brackets and parse into correct format from stdin?

时间:2018-04-20 00:46:35

标签: python json date datetime stdin

This is my first post here on stackoverflow, I am working on a programming assignment for school and am working on the following problem which I am stuck on.

Write a program that reads in events from STDIN and outputs the events back to toSTDOUT with the “overlap” flag flipped for the events that overlap with others. The firstline of the input will be the number of events to follow, N. N will be 1 million or more.The subsequent N lines will contain events in the following format:


{ ‘start_time’: string format, ‘end_time’: string format, ‘overlap’: boolean represented as 1 or 0 }
Sample input: 3 { ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 0} { ‘start_time’: “2016-02-01 00:00:00”, ‘end_time’: “2016-06-01 00:00:00”, ‘overlap’: 0} { ‘start_time’: “2012-01-01 00:00:00”, ‘end_time’: “2012-05-01 00:00:00”, ‘overlap’: 0} Sample output:
{ ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 1} { ‘start_time’: “2016-02-01 00:00:00”, ‘end_time’: “2016-06-01 00:00:00”, ‘overlap’: 1} { ‘start_time’: “2012-01-01 00:00:00”, ‘end_time’: “2012-05-01 00:00:00”, ‘overlap’: 0}

The input would be given in a "timestamp.txt", which I will read using the python io library, I would then input the lines into an array and using either the sorted() method or the .sort() method to sort the lines by timestamp order, Once I have a sorted array, I would then compare each successive events end time and start time to see if there is an overlap with the end_time of the first event with the start_time of the second event.

What I am currently stuck on is how to actual retrieve the value from each line text.

Since the file comes in a format
{ ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 0} It is not json format exactly, so I cannot do something like line = json.loads(line) and get the value by line['start_time'].

Anybody have any suggestion for this problem set? Thank you.

3 个答案:

答案 0 :(得分:2)

由于输入的每一行都是dict的repr格式,请考虑使用ast模块加载。 ast是python的抽象语法树,有助于将编写的代码(字符串)转换为python语法。

    import ast

    #your code to read the file here
    for line in file:
     DoProcessing(ast.literal_eval(line))

回答了类似的问题here

答案 1 :(得分:0)

have you tried stripping the angle bracket at the beginning and at the end, so it will look like: ‘start_time’: “2016-01-01 00:00:00”, ‘end_time’: “2016-05-01 00:00:00”, ‘overlap’: 0 after the operation, and from there you can split the result string into a list of strings.

答案 2 :(得分:0)

看起来像json有奇怪的引号字符。查看数据源以查看是否有意使用json以及将这些引号转换为扩展unicode引号将非常有用。也许有人通过文字处理器运行这些。或者有一些Windows代码页来解码转换问题。

无论如何,您可以修改引号和(至少对于您的小示例)解析工作

fix_quote_transform = str.maketrans({q:'"' for q in '‘”’“'})
for line in sys.stdin:
    obj = json.loads(line.trans(fix_quote_transform))