从python中的df -h输出中选择特定列

时间:2012-08-19 14:24:07

标签: python parsing unix

我正在尝试创建一个简单的脚本,它将从unix df - h命令中选择特定的列。我可以用awk来做这个但是我们怎么能在python中做到这一点?

这是df -h输出:

Filesystem                    Size  Used  Avail  Use%  Mounted on
/dev/mapper/vg_base-lv_root   28G   4.8G    22G   19%  /
tmpfs                        814M   176K   814M    1%  /dev/shm
/dev/sda1                    485M   120M   340M   27%  /boot

我想要类似的东西:

第1栏:

Filesystem
/dev/mapper/vg_base-lv_root           
tmpfs                 
/dev/sda1

第2栏:

Size
28G
814M 
485M   

8 个答案:

答案 0 :(得分:11)

您可以使用op.popen运行命令并检索其输出,然后splitlinessplit分割行和字段。运行df -Ph而非df -h,以便在列过长时不会拆分行。

df_output_lines = [s.split() for s in os.popen("df -Ph").read().splitlines()]

结果是一个行列表。要提取第一列,可以使用[line[0] for line in df_output_lines](请注意列从0开始编号),依此类推。您可能希望使用df_output_lines[1:]代替df_output_lines来删除标题行。

如果您已将df -h的输出存储在某个文件中,则需要先加入这些行。

fixed_df_output = re.sub('\n\s+', ' ', raw_df_output.read())
df_output_lines = [s.split() for s in fixed_df_output.splitlines()]

请注意,这假定文件系统名称和安装点都不包含空格。如果他们这样做(对某些unix变体进行某些设置是可能的),实际上不可能解析df的输出,甚至是df -P。您可以使用os.statvfs获取有关给定文件系统的信息(这是C function的Python接口,df内部为每个文件系统调用{}},但是没有可移植的方法来枚举文件系统

答案 1 :(得分:2)

以下是完整的示例:

import subprocess
import re

p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)
dfdata, _ = p.communicate()

dfdata = dfdata.replace("Mounted on", "Mounted_on")

columns = [list() for i in range(10)]
for line in dfdata.split("\n"):
    line = re.sub(" +", " ", line)
    for i,l in enumerate(line.split(" ")):
        columns[i].append(l)

print columns[0]

它假定挂载点不包含空格。

这是一个更加完整(且复杂的解决方案),它没有硬核列数:

import subprocess
import re

def yield_lines(data):
    for line in data.split("\n"):
        yield line

def line_to_list(line):
    return re.sub(" +", " ", line).split()

p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)
dfdata, _ = p.communicate()

dfdata = dfdata.replace("Mounted on", "Mounted_on")

lines = yield_lines(dfdata)

headers = line_to_list(lines.next())

columns = [list() for i in range(len(headers))]
for i,h in enumerate(headers):
    columns[i].append(h)

for line in lines:
    for i,l in enumerate(line_to_list(line)):
        columns[i].append(l)

print columns[0]

答案 2 :(得分:2)

不是问题的答案,但我试图解决问题。 :)

from os import statvfs

with open("/proc/mounts", "r") as mounts:
    split_mounts = [s.split() for s in mounts.read().splitlines()]

    print "{0:24} {1:24} {2:16} {3:16} {4:15} {5:13}".format(
            "FS", "Mountpoint", "Blocks", "Blocks Free", "Size", "Free")
    for p in split_mounts:
        stat = statvfs(p[1])
        block_size = stat.f_bsize
        blocks_total = stat.f_blocks
        blocks_free = stat.f_bavail

        size_mb = float(blocks_total * block_size) / 1024 / 1024
        free_mb = float(blocks_free * block_size) / 1024 / 1024

        print "{0:24} {1:24} {2:16} {3:16} {4:10.2f}MiB {5:10.2f}MiB".format(
                p[0], p[1], blocks_total, blocks_free, size_mb, free_mb)

答案 3 :(得分:1)

不使用os.popen,因为它已被弃用(http://docs.python.org/library/os#os.popen)。

我已将df -h的输出放在一个文件:test.txt中,只读取该文件。但是,您也可以使用子进程读取。假设您能够读取df -h输出的每一行,以下代码将有所帮助: -

f = open('test.txt')

lines = (line.strip() for line in f.readlines())
f.close()    
splittedLines = (line.split() for line in lines)
listOfColumnData = zip(*splittedLines)
for eachColumn in listOfColumnData:
    print eachColumn

eachColumn将显示您想要的整个列作为列表。你可以迭代它。 如果需要,我可以提供从df -h读取输出的代码,以便您可以删除对test.txt的依赖,但是,如果您转到子流程文档,您可以找到如何轻松地完成它。

答案 4 :(得分:1)

我有一个带有空格的挂载点。这放弃了大多数例子。这从@ZarrHai的example借了很多,但把结果放在dict

#!/usr/bin/python
import subprocess
import re
from pprint import pprint

DF_OPTIONS = "-laTh" # remove h if you want bytes.

def yield_lines(data):
    for line in data.split("\n"):
        yield line

def line_to_list(line):
    pattern = re.compile(r"([\w\/\s\-\_]+)\s+(\w+)\s+([\d\.]+?[GKM]|\d+)"
                          "\s+([\d\.]+[GKM]|\d+)\s+([\d\.]+[GKM]|\d+)\s+"
                          "(\d+%)\s+(.*)")
    matches = pattern.search(line)
    if matches:
        return matches.groups()
    _line = re.sub(r" +", " ", line).split()
    return _line

p = subprocess.Popen(["df", DF_OPTIONS], stdout=subprocess.PIPE)
dfdata, _ = p.communicate()

dfdata = dfdata.replace("Mounted on", "Mounted_on")

lines = yield_lines(dfdata)

headers = line_to_list(lines.next())

columns = [list() for i in range(len(headers))]
for i,h in enumerate(headers):
    columns[i].append(h)

grouped = {}
for li, line in enumerate(lines):
    if not line:
        continue
    grouped[li] = {}
    for i,l in enumerate(line_to_list(line)):
        columns[i].append(l)
        key = headers[i].lower().replace("%","")
        grouped[li][key] = l.strip()

pprint(grouped)

答案 5 :(得分:0)

这有效:

#!/usr/bin/python

import os, re

l=[]
p=os.popen('df -h')
for line in p.readlines():
    l.append(re.split(r'\s{2,}',line.strip()))


p.close()

for subl in l:
    print subl

答案 6 :(得分:0)

我发现这是一种简单的方法......

df -h |  awk '{print $1}'

答案 7 :(得分:0)

我在所有已访问的系统中注意到的一件事:带有选项-P的df在空白对齐的列中打印。这意味着标题与其余项目的宽度相同(用空格填充)。建立在the7erm's answer上,它使用标头的大小来确保它获得整个安装点,即使其中有空格也是如此。

这已在Ubuntu 14.04、16.04和FreeBSD 9.2上进行了测试。

我已经解决了两种不同的方法,第一种是直接回答OP的问题,它给出6列,每列以标题开头,然后在其下方依次有每个安装点:

import pprint
import subprocess
import re

DF_OPTIONS = "-PlaTh" # remove h if you want bytes.

# Get the entire output of df
dfdata = subprocess.getoutput("df " + DF_OPTIONS)

# Split it based on newlines
lines = dfdata.split("\n")

dfout = {}
headers = []

# Grab the headers, retain whitespace!
# df formats in such a way that each column header has trailing whitespace 
# so the header is equal to the maximum column width. We want to retain
# this for len()
headersplit = re.split(r'(\s+)', lines[0].replace("Mounted on","Mounted_on "))
headers = [i+j for i,j in zip(headersplit[0::2],headersplit[1::2])]

for hi,head in enumerate(headers):
  dfout[hi] = [head.strip()]

for line in lines[1:]:
  pos = 0
  dfstruct = {}
  for hi,head in enumerate(headers):
    # For the last item, grab the rest of the line
    if head == headers[-1]:
      item = line[pos:]
    else:
      # Get the current item
      item = line[pos:pos+len(head)]

    pos = pos + len(head)

    #Strip whitespace and add it to the list

    dfstruct[head.strip()] = item.strip()
    dfout[hi].append(item.strip())

pprint.pprint(dfout)

第二个对我来说更有用,这也是为什么我一开始就偶然发现这个问题的解决方案。这会将信息放入一系列字典中:

import pprint
import subprocess
import re

DF_OPTIONS = "-PlaTh" # remove h if you want bytes.

# Get the entire output of df
dfdata = subprocess.getoutput("df " + DF_OPTIONS)

# Split it based on newlines
lines = dfdata.split("\n")

dfout = []
headers = []

# Grab the headers, retain whitespace!
# df formats in such a way that each column header has trailing whitespace 
# so the header is equal to the maximum column width. We want to retain
# this for len()
headersplit = re.split(r'(\s+)', lines[0].replace("Mounted on","Mounted_on "))
headers = [i+j for i,j in zip(headersplit[0::2],headersplit[1::2])]

for line in lines[1:]:
  pos = 0
  dfstruct = {}
  for head in headers:
    # For the last item, grab the rest of the line
    if head == headers[-1]:
      item = line[pos:]
    else:
      # Get the current item
      item = line[pos:pos+len(head)]

    pos = pos + len(head)
    #Strip whitespace for our own structure
    dfstruct[head.strip()] = item.strip()

  dfout.append(dfstruct)

pprint.pprint(dfout)