Python 2.7 - 操纵CSV文件中的一些数据

时间:2018-04-17 13:39:40

标签: python python-2.7 csv

首先,我想强调一点,我是python的初学者,下面的代码我用来操作CSV中的一些数据。我知道这不是最漂亮的代码,可能我可以让它更优雅,但它有效,直到某一点,这就是我打开这个问题的原因

import csv
from numpy import interp
from operator import sub
import math
import pandas as pd
from Tkinter import *
import Tkinter as tk
import tkFileDialog as filedialog

root = Tk()
root.withdraw()
filename= filedialog.askopenfilename( initialdir="C:/", title="select file", filetypes=(("CSV files", "*.CSV"), ("all files", "*.*")))



id_uri = []
ore = []
minute = []
zile = []
activi = []
listx = []
listsa = []
list_ore = []
listspi = []
listspf = []
list_min = []
zile_luna = 0
test = []
nume = []
with open (filename) as p, open ('activi.csv') as a:
        reader = csv.reader(p,delimiter=',')
        for row in reader:
                id_uri.append(row[0])
                ore.append(row[1])
                minute.append(row[2])
                zile.append(row[3])
        reader = csv.reader(a)
        for row in reader:
                activi.append(row[0])
                nume.append(row[1])
id_uri = map(int, id_uri)
ore = map(float, ore)
minute = map(float, minute)
minute = interp(minute,[0,60],[0,100])
ore = ore + minute/100
zile = map(int, zile)
activi = map(int, activi)
zile_luna = len(set(zile))+1
mimin = 0
maxim = 0
def pontaj():
        global listx
        global listsa
        global listspi
        global listspf
        global list_ore
        global list_min
        global maxim
        global minim
        for x in range(3):
                for y in range(len(id_uri)):
                        if zile[y] == z:
                                if activi[x] == id_uri[y]:
                                        listx.append(ore[y])
                                        minim = min(listx)
                                        maxim = max(listx)
                listsa.append(maxim-minim)
                listx = []
        listspi = [int(i) for i in listsa]
        listspf = [i%1 for i in listsa]
        for i in range(len(listspf)):
                listspf[i] = round(listspf[i], 2)
                listspf[i] = listspf[i]*100
                listspf[i] = interp(listspf[i],[0,100],[0,60])
                listspf[i] = int(listspf[i])
        list_ore.append(listspi)
        list_min.append(listspf)
        listsa = []

for z in range(1,zile_luna):
        pontaj()
for sublst in list_ore:
        for item in range(len(sublst)):
                sublst[item] = str(sublst[item])
for sublst in list_min:
        for item in range(len(sublst)):
                sublst[item] = str(sublst[item])
for i in range(len(list_ore)):
        for j in range(len(list_ore[i])):    
                list_ore[i][j] = ' '.join(i + ':' + j for i,j in zip(list_ore[i][j],list_min[i][j]))
df = pd.DataFrame(list_ore)
df = df.T
nume = pd.Series(nume)
df['e'] = nume.values
df.to_csv('pontaj.csv', index = False, header = False)
print df

和CSV文件我读了所有信息,如下所示(员工代码,小时,分钟,日):

23,5,00,1
23,6,00,1
24,7,00,1
25,8,00,1
24,9,00,1
25,11,00,1
24,7,00,2
25,8,00,2
24,9,00,2
25,11,00,2
23,5,00,4
23,6,00,4
24,7,00,4
25,8,00,4
24,9,00,4
25,11,00,4

我有另一个CSV文件,其员工代码如下所示:

23,aqwe
24,beww
25,cwww

基本上它是一个考勤记录器,它将一个CSV的信息与另一个CSV进行比较,查找某一天的最小和最大小时数,从最大值减去min并将此信息写入写入另一个csv的列表中。

事情是,如果所有员工都参加某一天,一切顺利,它计算出勤时间,将它们放入csv,一切顺利。但如果一名员工跳过一天会发生什么?正如我发现的那样,它破坏了计算,因为代码要求所有数据必须一致并且处于完美的顺序。

写入CSV文件的数据最终必须如下所示:

day1 day2 day3
hours hours hours employee_a
hours hours hours employee_b
hours hours hours employee_c

但是,如果一天跳过,那么小时就会被打乱。

我尝试了一些不同的方法,但都没有用,我意识到问题是由于我简单的思维方式,但正如我所说,我几天前才开始使用python。

对于如何改进代码以考虑某个员工的错过日期以及如此生成数据,您有什么建议:

 day1 day2 day3
1:20 2:30 3:40 employee_a
1:20 2:30 3:40 employee_b
0:0  2:30 3:40 employee_c

任何建议都将不胜感激,谢谢!

0 个答案:

没有答案