从文件中读取列

Question

我使用Python远程运行命令，这是我得到的输出：

Vserver   Volume       Aggregate    State      Type       Size  Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
vs_cfm06  Available    aggr_backup_1 online    RW        100GB    66.37GB   33%
vs_cfm06  Discovery    aggr_backup_1 online    RW        100GB    66.36GB   33%
vs_cfm06  NonDebugCF01 aggr_backup_1 online    RW        100GB    64.63GB   35%
vs_cfm06  NonDebugCF01_BACKUP aggr_backup_1 online RW      5GB     4.75GB    5%
vs_cfm06  Software     aggr_backup_1 online    RW        100GB    65.08GB   34%
vs_cfm06  Template     aggr_backup_1 online    RW        100GB    66.35GB   33%
vs_cfm06  breakresetafterfaildelCF01 aggr_backup_1 online RW 100GB 69.52GB  30%
vs_cfm06  breakresetafterfaildelCF01_BACKUP aggr_backup_1 online RW 5GB 4.75GB  5%
vs_cfm06  rootvol      aggr_backup_1 online    RW          1GB    972.5MB    5%
vs_cfm06  vol          aggr_backup_1 online    RW          1GB    972.6MB    5%
10 entries were displayed.

如何从中提取一列，以便我的输出如下：

Available   
Discovery    
NonDebugCF01 
NonDebugCF01_BACKUP 
Software     
Template     
breakresetafterfaildelCF01 
breakresetafterfaildelCF01_BACKUP 
rootvol      
vol

运行命令并打印输出的代码是：

def get_volumes(usrname, ip):

    raw_output = ru.run('volume show', user=usrname, host=ip, set_e=False) //logs onto netapp and runs command

    print raw_output

当我运行print type(raw_output)时，它说它是unicode。任何帮助将不胜感激。

Answer 1

从文件中读取列

文本文件本质上是面向行的，当您在文本编辑器中打开它时，您可以对文本行进行操作。

这种固有的结构反映在使用python诋毁文本文件内容的惯用方法中：

data = [line for line in file(fname)]

data是与文件行对应的字符串列表。

有时文本更加结构化，您可以看到其中有一个列式组织。为简单起见，请说我们有

标题的初始行，
可能有一些垃圾和
包含实际数据的多行，

此外，我们假设每个相关行包含相同数量的列。

你可以使用的成语是

data = [line.split() for line in file(fname)]

这里data现在是列表列表，文件每一行有一个子列表，每个子列表都是按行逐列拆分的字符串列表。

在列中重新排序

虽然您可以通过data[row][column]访问每个数据项，但使用标题引用数据可能更方便，如data['Aggregate'][5] ...在python中，使用字符串来处理数据通常使用字典，您可以使用所谓的字典理解来构建字典

n = 2 # in your example data
data_by_rows = [line.split() for line in file(fname)]
data_by_cols = {col[0]:list(col[n:]) for col in zip(*data_by_rows)}

这是有效的，因为成语zip(*list_of_rows)会返回list_of_cols。

>>> a = [[1,2,3],[10,20,30]]
>>> zip(*a)
[(1, 10), (2, 20), (3, 30)]
>>>

继续前进

如果文件格式简单并且不涉及您想要进行的操作，我们所看到的内容简单易用。对于更复杂的格式和/或操作要求，python在标准库

中提供了许多选项

csv模块简化了阅读（和写作）逗号（/ tab）分隔值文件的任务，

或作为可选的模块

numpy模块，旨在进行数值分析，设有从文本文件中剔除所有数据并将其置于array结构中的设施，
pandas模块，旨在建立在numpy上的数据分析和建模，还具有将结构化文本文件转换为数据框结构的功能。

Answer 2

有两个方便的功能可以满足您的需求：readlines()分割行中的文件，str.split()分割字符串（默认情况下，使用任何空格作为分隔符）。

with open("input.txt") as f:
     lines = f.readlines()

for line in lines[2:]:
     columns = line.split()
     print(columns[1])

不使用readlines()的替代方法是：

with open("input.txt") as f:
     content = f.read()  # does not detect lines

lines = content.split("\n")
for line in lines[2:]:
     columns = line.split()
     print(columns[1])

最后，您可能正在处理其行终止是＆＃34; \ n＆＃34;，（GNU / Linux），＆＃34; \ r \ n＆＃34; （Windows）或＆＃34; \ r＆＃34; （苹果系统）。然后你必须使用re模块：

with open("input.txt") as f:
     content = f.read()  # does not detect lines

lines = re.split("\r?\n?", content)
for line in lines[2:]:
     columns = line.split()
     print(columns[1])

从Python中的字符串中提取列

2 个答案:

从文件中读取列

在列中重新排序

继续前进