获取文件大小并附加到CSV文件的新列

时间:2012-04-13 20:39:08

标签: python csv

Python 2.4 对于我的例子,我有一个2列csv文件

例如:

HOST, FILE
server1, /path/to/file1
server2, /path/to/file2
server3, /path/to/file3

我想获取csv文件中每行的PATH对象的文件大小,然后将该值添加到新列的csv文件中。 成功:

 HOST, PATH, FILESIZE
 server1, /path/to/file1, 6546542
 server2, /path/to/file2, 46546343
 server3, /path/to/file3, 87523

我尝试过几种方法,但是很有成功。

下面的代码在PATH上执行fileSizeCmd(du -b)并正确输出filezie,但我想知道如何使用数据添加到csv文件

 import datetime
 import csv
 import os, time
 from subprocess import Popen, PIPE, STDOUT

 now = datetime.datetime.now()
 fileSizeCmd = "du -b"
 SP = " "

 # Try to get disk size and append to another row after entry above
 #st = os.stat(row[3])
 #except IOError:
 #print "failed to get information about", file
 #else:
 #print "file size:", st[ST_SIZE]
 #print "file modified:", time.asctime(time.localtime(st[ST_MTIME]))

 incsv = open('my_list.csv', 'rb')
 try:
     reader = csv.reader(incsv)
     outcsv = open('results/results_' + now.strftime("%m-%d-%Y") + '.csv', 'wb')
     try:
         writer = csv.writer(outcsv)

         for row in reader:
         p = Popen(fileSizeCmd + SP + row[1], shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
         stdout, empty = p.communicate()


         print 'Command: %s\nOutput: %s\n' % (fileSizeCmd + SP + row[1], stdout)

         #  Results in bytes example
         #
         #  Output:
         #  8589935104      /path/to/file
         #

     #  Write 8589935104 to new column of csv FILE

   finally:
      outcsv.close()

 finally:
incsv.close()

3 个答案:

答案 0 :(得分:1)

没有错误处理的草图:

#!/usr/bin/env python

import csv
import os

filename = "sample.csv"
# localhost, 01.html.bak
# localhost, 01.htmlbak
# ...

def filesize(filename):
    # no need to shell out for filesize
    return os.stat(filename).st_size

with open(filename, 'rb') as handle:
    reader = csv.reader(handle)
    # result is written to sample.csv.updated.csv
    writer = csv.writer(open('%s.updated.csv' % filename, 'w'))
    for row in reader:
        # need to strip filename, just in case
        writer.writerow(row + [ filesize(row[1].strip()) ])

# result
# localhost, 01.html.bak,10021
# localhost, 01.htmlbak,218982
# ...

答案 1 :(得分:0)

你可以

1)将cvs内容读入(服务器,文件名)元组列表

2)收集此列表中每个元素的文件大小

3)将结果打包成另一个元组(server,filename,filesize)到另一个列表('result')

4)将结果写出到新文件

答案 2 :(得分:0)

首先,获取文件大小比使用subprocess要容易得多(参见os.stat):

>>> os.stat('/tmp/file').st_size
100

其次,您正在使用writer对象写入其他文件的正确轨道,但您只需要在row列表中添加一列,即可从reader列表中找到{1}}然后将其提供给writerow上的writer(请参阅here)。像这样:

>>> writerfp = open('out.csv', 'w')
>>> writer = csv.writer(writerfp)
>>> for row in csv.reader(open('in.csv', 'r')):
...     row.append('column')
...     writer.writerow(row)
...
>>> writerfp.close()