Question

我需要从文件中提取一些值，我编写了以下代码。

import os
import sys
rootdir='/home/nsingh/ansible-environments/aws'
for root, subdirs, files in os.walk(rootdir):
  for j in subdirs:
    print j
    mypath=rootdir+'/'+j+'/inventory/group_vars/all'
    #print mypath
    fo=open(mypath,'r')
    f=fo.readlines()
    for line in f:
            if ('isv_alias' in line or 'LMID' in line or 'products' in line):
                         path='/home/nsingh/krikCSV_fun.csv'
                         if('isv_alias' in line):
                            line=line.strip('isv_alias=')
                            line= line.strip('"')
                         elif('LMID'  in line):
                            line=line.strip('LMID=')
                         else:
                            line=line.strip('products=')

                         fi= open(path,'a+')
                         fi.write(line)
                         fi.close()
    fo.close()

os.walk方法以某种方式找到了一个实际上不存在的隐藏目录

loadgen
crapcity
rmstest2
suricatatest
.git
Traceback (most recent call last):
  File "testme.py", line 9, in <module>
    fo=open(mypath,'r')
IOError: [Errno 2] No such file or directory: '/home/nsingh/ansible-environments/aws/.git/inventory/group_vars/all'

输出：

: "suricatatest"^M
: suricatatest
: rms_ems_hosted
: 26
: rmstest2
: rms_scl
: 80
: suricatatest
: rms_ems_hosted
: 26
: "suricatatest"^M
: suricatatest
: rms_ems_hosted
: 26

我需要输出为＆amp;也删除分号：

suricatatest rms_ems_hosted 26

Answer 1

是什么让你认为/.git不存在？

试试这个：

import os

rootdir = '/home/nsingh/ansible-environments/aws'
for root, subdirs, files in os.walk(rootdir):
    for j in subdirs:
        print(j)
        my_path = rootdir + '/' + j + '/inventory/group_vars/all'
        if os.path.isfile(my_path):
            with open(my_path, 'r') as fo:
                for line in fo.readlines():
                    if 'isv_alias' in line or 'LMID' in line or 'products' in line:
                        path = '/home/nsingh/krikCSV_fun.csv'
                        if 'isv_alias' in line:
                            line = line.strip('isv_alias=')
                            line = line.strip('"')
                        elif 'LMID' in line:
                            line = line.strip('LMID=')
                        else:
                            line = line.strip('products=')

                        with open(path, 'a+') as fi:
                            fi.write(line.lstrip(": "))

Answer 2

您应该使用os.path制作文件路径。 os.walk将访问顶层目录下树中的所有目录 - 您只对以'inventory/group_vars'结尾的目录感兴趣，因此请检查并采取措施。如果要将值写为组，则需要将其收集到某些内容中。

import os, os.path, collections
rootdir = '/home/nsingh/ansible-environments/aws'
sub_folder = 'inventory/group_vars'
out_path = '/home/nsingh/krikCSV_fun.csv'
for dirpath, dirnames, filenames in os.walk(rootdir):
    if dirpath.endswith(sub_folder):
        data = collections.defaultdict(list)
        with open(os.join(dirpath, 'all')) as f, open(out_path, 'a+') as out:
            for line in f:
                if 'isv_alias' in line:
                    line = line.strip('isv_alias=')
                    line = line.strip('"')
                    data['isv_alias'].append(line)
                elif 'LMID'  in line:
                    line = line.strip('LMID=')
                    data['LMID'].append(line)
                elif 'products' in line:
                    line = line.strip('products=')
                    data['products'].append(line)
            for a, b, c in zip(*data.values()):
                out.write('{},{},{}\n'format(a, b, c))

我使用defaultdict来存储单个文件中的多个感兴趣项目。如果每个文件中只有一个'isv_alias', 'LMID', 'products' 组，那么您可以轻松地将信息存储在列表或命名元组中。

您没有提供文件的示例，因此不清楚行结构是什么。如果它看起来像这样：

isv_alias="foo"
LMID=bar
products=26

可以简化为

keys = {'isv_alias', 'LMID', 'products'}
for dirpath, dirnames, filenames in os.walk(rootdir):
    if dirpath.endswith(sub_folder):
        data = collections.defaultdict(list)
        with open(os.join(dirpath, 'all')) as f, open(out_path, 'a+') as out:
            for line in f:
                line = line.strip()
                key, value = line.split('=')
                if key in keys:
                    value = value.strip('"')
                    data[key].append(value)
            for a, b, c in zip(*data.values()):
                out.write('{},{},{}\n'format(a, b, c))

只要您在data中累积信息，就可以打开输出文件一次

data = collections.defaultdict(list)
keys = {'isv_alias', 'LMID', 'products'}
for dirpath, dirnames, filenames in os.walk(rootdir):
    if dirpath.endswith(sub_folder):
        with open(os.join(dirpath, 'all')) as f:
            for line in f:
                line = line.strip()
                key, value = line.split('=')
                if key in keys:
                    value = value.strip('"')
                    data[key].append(value)

with open(out_path, 'a+') as out:
    for a, b, c in zip(*data.values()):
        out.write('{},{},{}\n'format(a, b, c))

如果使用Python 3.6或ordered defaultdict，则上述解决方案假定文件中每个键的出现顺序是您希望它们写出的顺序。

如果文件结构没有订购，或者所使用的词典没有排序，请按如下方式写入文件：

            for a, b, c in zip(data['isv_alias'], data['LMID'], data['products']):
                out.write('{},{},{}\n'format(a, b, c))

os.walk找到一个甚至不存在的目录

2 个答案: