以更优雅的方式解析数据

时间:2015-01-17 15:50:29

标签: python parsing csv

我有一个日志文件,其中一些活动由用户完成,文件是csv,格式如下:

DATE,SCORE,STATUS,ACTIVITY_ID

我正在解析csv,然后以三个相似的视图(每日,每周,每年)显示它。

我已经设法正确显示它,但我的代码看起来很丑陋,重复,非常不敏感。

以下是我所拥有的:

import datetime

def update_views(log_file):
    log_day = {}
    log_week = {}
    log_month = {}
    day = 0
    cur_day = None
    week = 1
    for line in log_file:
        data = line.strip().split(",")

        year, month, _day = data[0].split("-")

        if cur_day != _day:
            cur_day = _day
            day += 1
            if day % 7 == 0:
                week += 1

        month_long = datetime.date(int(year), int(month), int(_day)).strftime("%B")

        if month_long not in log_month:
            log_month[month_long] = {"Comp": {}, "Miss": {}, "Post": {}, "Add": {}, "Score": 0}
        if "Week %i" % week not in log_week:
            log_week["Week %i" % week] = {"Comp": {}, "Miss": {}, "Post": {}, "Add": {}, "Score": 0}
        if "Day %i" % day not in log_day:
            log_day["Day %i" % day] = {"Comp": {}, "Miss": {}, "Post": {}, "Add": {}, "Score": 0}

        current_score = data[1]
        status = data[2]
        item_name = data[3]

        try:
            log_day["Day %i" % day][status][item_name] += 1
        except KeyError:
            log_day["Day %i" % day][status][item_name] = 1

        try:
            log_week["Week %i" % week][status][item_name] += 1
        except KeyError:
            log_week["Week %i" % week][status][item_name] = 1

        try:
            log_month[month_long][status][item_name] += 1
        except KeyError:
            log_month[month_long][status][item_name] = 1

        log_day["Day %i" % day]["Score"] = int(current_score)
        log_week["Week %i" % week]["Score"] = int(current_score)
        log_month[month_long]["Score"] = int(current_score)

log_file =   """2015-01-1,0,Add,DW_05
                2015-01-2,-1,Post,CR_02
                2015-01-3,-1,Comp,DIY_01
                2015-01-3,-1,Post,CD_01
                2015-01-4,-1,Miss,D_03
                2015-01-4,0,Miss,D_03
                2015-01-4,-1,Miss,CD_01
                2015-01-4,0,Miss,LS_04
                2015-01-5,1,Comp,DW_05
                2015-01-6,1,Comp,ANI_06
                2015-01-6,1,Comp,LS_04
                2015-01-7,1,Comp,NMW_07
                2015-01-7,1,Post,DW_05
                2015-01-7,1,Miss,LP_08
                2015-01-8,2,Post,CR_02
                2015-01-8,2,Miss,SEV_09
                2015-01-10,3,Comp,M_10
                2015-01-10,3,Add,NID_11
                2015-01-11,2,Add,ANI_06
                2015-01-12,1,Add,VF_12
                2015-01-12,0,Miss,DIY_01
                2015-01-12,1,Add,NID_11
                2015-01-12,0,Miss,D_03
                2015-01-13,1,Miss,SEV_09
                2015-01-13,2,Add,DW_05
                2015-01-13,1,Comp,NMW_07
                2015-01-13,1,Add,CPC_12""".splitlines()

update_views(log_file)

我需要帮助将其分解为更清晰的代码,我不喜欢使用那么多变量(day,week,cur_day)和try / except重复。

3 个答案:

答案 0 :(得分:0)

如果你的环境中有Pandas,那么解析CSV的最紧凑和面向未来的方法是read_csv。结果是pandas DataFrame,它可以以多种格式查询,转换,旋转和最后编写,包括HTML。

代码可能与

一样简约
import pandas as pd
df = pd.import_csv('file.csv', sep=r"\t+")

答案 1 :(得分:0)

Python有一个csv模块。

import csv
with csv.reader('path/to/file',csv.excel_tab) as d:
    pass
    #d is a list with values

然后

import datetime
def parse_date(strdt):
    return datetime.datetime.strptime(strdt, '%Y-%m-%d)

最后,请查看https://docs.python.org/2/library/collections.html#collections.Counter

答案 2 :(得分:0)

在codereview的帮助下,我创建了这个类:

class TreeData:
    """Set the data structure to be used for the QTreeViews."""

    def __init__(self, name):
        self.name = name
        self.data = {}

    def add_item(self, key, status, item_name, score):
        """
        Sets the structure
                Which consists of a dict with nested defaultdict(int)
                for completed/missed/postponed/added activities and Score
        """
        if self.name != "Month":
            key = '%s %i' % (self.name, key)

        if key not in self.data:
            self.data[key] = {"Comp": defaultdict(int),
                              "Miss": defaultdict(int),
                              "Post": defaultdict(int),
                              "Add": defaultdict(int),
                              "Score": 0}

        self.data[key][status][item_name] += 1
        self.data[key]["Score"] += int(score)

    @classmethod
    def setup(cls, main_window):
        """Main method of the class, is used to read and parse the file and set the structure for the QTrees"""
        day_n = 0
        cur_day = None
        week_n = 1

        cls.day = TreeData("Day")
        cls.week = TreeData("Week")
        cls.month = TreeData("Month")

        try:
            with open("log_file.txt") as log_file:
                for line in log_file:
                    # Splits the data into a meaningful way
                    date, score_change, status, item_name = line.strip().split("\t")
                    year, month, day = map(int, date.split("-"))
                    month_name = datetime.date(year, month, day).strftime("%B")

                    # sets the day/week numbers
                    if cur_day != day:
                        cur_day = day
                        day_n += 1
                        if day_n % 7 == 0:
                            week_n += 1

                    # structure the QTrees
                    cls.day.add_item(day_n, status, item_name, score_change)
                    cls.week.add_item(week_n, status, item_name, score_change)
                    cls.month.add_item(month_name, status, item_name, score_change)