python:读取csv,执行命令并将结果写入新的垂直列

时间:2012-04-12 11:48:42

标签: python csv

我是python的新手,我读过python的csv模块非常适合我想做的事情。 我花了一些时间尝试几种不同的方法,但还没有能够使用第四(垂直)列创建一个数组。

我有一个包含数百行的四列csv文件。在我继续之前,我应该验证python甚至可以完成我喜欢做的所有事情。

  1. 阅读csv文件,
  2. 在FILE

  3. 的第四(垂直)列上执行COMMAND
  4. COMMAND打印

  5. 读取每行健康(来自COMMAND)

  6. 将NEWYFILE的新第五栏上的HEALTHY写入所有五列

  7. 循环直到FILE的第一个空行
  8. 示例FILE(在单元格视图中以逗号分隔)

      HOST                    PLATFORM        ARCH               COMMAND
      server1                 win             x86_64             python '/root/server1.py'
      server2                 linux           x86_64             python '/root/server2.py'
      server3                 linux           x86_64             python '/root/server3.py'
    

    示例命令

      # python '/root/server1.py'
      --------------------
      Error: Could not open /root/server1.py
    
    
      # python '/root/server2.py'
      --------------------
      server2 p1 (NTFS)       output1:100  output:200    HEALTHY:Yes
      --------------------
    
    
      # python 'root/server3.py'
      --------------------
      server3 p1 (linux)       output1:100  output:200    HEALTHY:No
      server3 p2 (linux)       output1:100  output:200    HEALTHY:Yes
      server3 p3 (swap)       output1:100  output:200    HEALTHY:No
      --------------------
    

    如果有多行健康而且都不等于是,健康等于“否”

    如果在任何行上找不到HEALTHY,则HEALTHY等于“错误扫描”

    这是我到目前为止所拥有的

      #!/usr/bin/python
      #
    
      import csv
      import subprocess
    
      # read csv file
      csv_file = open("my_list.csv", "rb")
      my_csv_reader = csv.reader(csv_file, delimiter=",")
      my_data_list = []
      for row in my_csv_reader:
              print row
              my_data_list.append(row)
      csv_file.close()
    
      # write csv file
      csv_file = open("new_data.csv", "wb")
      my_csv_writer = csv.writer(csv_file, delimiter=",")
      for row in my_data_list:
              my_csv_writer.writerow(row)
      csv_file.close()
    
      # running commands, getting output
      # run COMMAND column from csv_file, use "python 'my_script.py'" for now
      # my_script.py only for now: print "HEALTHY:Yes"
      p = subprocess.Popen("python '/root/my_script.py'",stdout=subprocess.PIPE,stderr=subprocess.PIPE)
      output, errors = p.communicate()
      print output
      print errors
    

    执行上述内容:

      # python '/root/this_script.py'
      ['HOST', 'PLATFORM', 'ARCH', 'COMMAND']
      ['server1', 'win', 'x86_64', "python '/root/server1.py'"]
      ['server2', 'linux', 'x86_64', "python '/root/server2.py'"]
      ['server3', 'linux', 'x86_64', "python '/root/server3.py'"]
      Traceback (most recent call last): 
         File "thisscript.py", line 24, in ? 
           p = subprocess.Popen('python myscript1.py',stdout=subprocess.PIPE,stderr=subprocess.PIPE) 
         File "/usr/lib64/python2.4/subprocess.py", line 550, in __init__ 
           errread, errwrite) 
         File "/usr/lib64/python2.4/subprocess.py", line 993, in _execute_child 
           raise child_exception 
         OSError: [Errno 2] No such file or directory
    

    加成:
    如果我想在stdout /命令输出中搜索(例如linux,swap,NTFS等,以及上面讨论的第三个示例命令)并将其追加到行[5],或者之后它已经搜索了[i] Healthy [/ i] ...我已经尝试了一个新的if语句,但它似乎只追加了行[4],或者与[i] Healthy [[]相同的行/ i]中。

    我也无法弄清楚如何使用OR语句。哪里

     if 'Linux' OR 'swap' OR 'LVM' in stdout:  
         writer.writerow(row + ['Linux']) # for multiple lines/partitions.
    
     elif 'BSD' in stdout:  
         writer.writerow(row + ['BSD'])
    
     elif 'NTFS' in stdout:
         writer.writerow(row + ['Windows'])
    
     else:
         writer.writerow(row + ['Error Scanning'])
    

    最后我将COMMAND列更改为PATH并修改了命令以执行PATH。这是有效的。我想执行第二个命令来获取PATH的文件大小。我尝试了几种方法。

    感谢您的时间。我希望这一切都可以完成。

2 个答案:

答案 0 :(得分:4)

您没有正确使用subprocess.Popen,这导致了直接问题(OSError: [Errno 2] No such file or directory)。

通常,Popen的第一个参数应该是序列,而不是字符串,除非您还传递shell=True关键字参数。如果第一个参数是字符串并且shell=False(默认值),Popen将尝试执行为字符串值命名的文件。没有名为"python '/root/my_script.py'"的文件(整个字符串),因此您获得了OSError

所以,

p = subprocess.Popen(
    "python '/root/my_script.py'", 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

可能应该变得像......

p = subprocess.Popen(
    ["python", "'/root/my_script.py'"], 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

或(基本上等同于)

p = subprocess.Popen(
    "python '/root/my_script.py'".split(), 
     stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

或(见warning

p = subprocess.Popen(
    "python '/root/my_script.py'", shell=True,
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

更新:您的问题的答案是肯定的。 Python可以帮助您完成所有您想要做的事情。这是你的清单。

SPOILER ALERT!如果你想为自己解决问题,请不要超越这一行。

  • read a csv FILE

你做得很好。另一种方式......

with open('my_list.csv', 'rb') as fp:
    my_data_list = [row for row in csv.reader(fp)]

...介绍了一些潜在的新概念,with statementlist comprehensions。但是你真的不需要一个中间列表来处理,你可以在同一个循环中读写(见下文)

  • executes COMMAND on fourth (vertical) column of FILE
  • the COMMAND prints
  • loop until first empty row of FILE

我假设您要打印运行命令的输出或结果。

for row in my_data_list:
    command = row[3] #<- 4th column is index 3, 1st is 0
    p = Popen(command.split(), stdout=PIPE, stderr=STDOUT) #<- stderr to stdout
    stdout, empty = p.communicate()
    print stdout
  • read each line for HEALTHY (from COMMAND)
    • if multiple lines of HEALTHY and all do not equal Yes, HEALTHY equals "No"
    • if HEALTHY is not found on any lines, HEALTHY equals "Error Scanning"
  • write HEALTHY on new fifth column to NEW_FILE with all five columns

    if 'HEALTHY:No' in stdout:
        writer.writerow(row + ['No'])
    elif 'HEATHLY:Yes' in stdout:
        writer.writerow(row + ['Yes'])
    else: 
        writer.writerow(row + ['Error Scanning'])
    

将所有这些放在一起(未经测试)......

import csv
from subprocess import Popen, PIPE, STDOUT

with open('my_list.csv', 'rb') as incsv:
    with open('new_data.csv', 'wb') as outcsv:
        reader = csv.reader(incsv)
        writer = csv.writer(outcsv)

        for row in reader:
            p = Popen(row[3].split(), stdout=PIPE, stderr=STDOUT)
            stdout, empty = p.communicate()

            print 'Command: %s\nOutput: %s\n' % (row[3], stdout)

            if 'HEALTHY:No' in stdout:
                writer.writerow(row + ['No'])
            elif 'HEATHLY:Yes' in stdout:
                writer.writerow(row + ['Yes'])
            else: 
                writer.writerow(row + ['Error Scanning'])

更新:修复了csv阅读器和编写器文件对象的命名选择

更新: Python 2.5引入了from __future__ import with_statement指令。对于早于2.5的python版本,with语句不可用。在这种情况下,常见的方法是最终在try中包装文件操作。如,

import csv
from subprocess import Popen, PIPE, STDOUT

incsv = open('my_list.csv', 'rb')
try:
    reader = csv.reader(incsv)
    outcsv = open('new_data.csv', 'wb')
    try:    
        writer = csv.writer(outcsv)

        for row in reader:
            p = Popen(row[3].split(), stdout=PIPE, stderr=STDOUT)
            stdout, empty = p.communicate()

            print 'Command: %s\nOutput: %s\n' % (row[3], stdout)

            if 'HEALTHY:No' in stdout:
                writer.writerow(row + ['No'])
            elif 'HEATHLY:Yes' in stdout:
                writer.writerow(row + ['Yes'])
            else: 
                writer.writerow(row + ['Error Scanning'])
    finally:
        outcsv.close()
finally:
    incsv.close()

HTH!

答案 1 :(得分:1)

在“奖金”部分:

如果您想搜索多个内容,最简单,最直接的方法是分别搜索每个内容,然后与or联系:

if 'Linux' in stdout or 'swap' in stdout or 'LVM' in stdout:
    writer.writerow(row + ['Linux'])

如果您觉得这不合适或者需要搜索更多内容,可以使用any函数和生成器表达式

if any(x in stdout for x in ('Linux', 'swap', 'LVM')):
    writer.writerow(row + ['Linux'])

最后,如果这仍然太不优雅,或者如果stdout变得更大并且您不想多次搜索它,则可以通过re模块使用正则表达式。