我正在尝试创建一个简单的脚本,它将从unix df - h
命令中选择特定的列。我可以用awk来做这个但是我们怎么能在python中做到这一点?
这是df -h
输出:
Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_base-lv_root 28G 4.8G 22G 19% / tmpfs 814M 176K 814M 1% /dev/shm /dev/sda1 485M 120M 340M 27% /boot
我想要类似的东西:
第1栏:
Filesystem /dev/mapper/vg_base-lv_root tmpfs /dev/sda1
第2栏:
Size 28G 814M 485M
答案 0 :(得分:11)
您可以使用op.popen
运行命令并检索其输出,然后splitlines
和split
分割行和字段。运行df -Ph
而非df -h
,以便在列过长时不会拆分行。
df_output_lines = [s.split() for s in os.popen("df -Ph").read().splitlines()]
结果是一个行列表。要提取第一列,可以使用[line[0] for line in df_output_lines]
(请注意列从0开始编号),依此类推。您可能希望使用df_output_lines[1:]
代替df_output_lines
来删除标题行。
如果您已将df -h
的输出存储在某个文件中,则需要先加入这些行。
fixed_df_output = re.sub('\n\s+', ' ', raw_df_output.read())
df_output_lines = [s.split() for s in fixed_df_output.splitlines()]
请注意,这假定文件系统名称和安装点都不包含空格。如果他们这样做(对某些unix变体进行某些设置是可能的),实际上不可能解析df
的输出,甚至是df -P
。您可以使用os.statvfs
获取有关给定文件系统的信息(这是C function的Python接口,df
内部为每个文件系统调用{}},但是没有可移植的方法来枚举文件系统
答案 1 :(得分:2)
以下是完整的示例:
import subprocess
import re
p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)
dfdata, _ = p.communicate()
dfdata = dfdata.replace("Mounted on", "Mounted_on")
columns = [list() for i in range(10)]
for line in dfdata.split("\n"):
line = re.sub(" +", " ", line)
for i,l in enumerate(line.split(" ")):
columns[i].append(l)
print columns[0]
它假定挂载点不包含空格。
这是一个更加完整(且复杂的解决方案),它没有硬核列数:
import subprocess
import re
def yield_lines(data):
for line in data.split("\n"):
yield line
def line_to_list(line):
return re.sub(" +", " ", line).split()
p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)
dfdata, _ = p.communicate()
dfdata = dfdata.replace("Mounted on", "Mounted_on")
lines = yield_lines(dfdata)
headers = line_to_list(lines.next())
columns = [list() for i in range(len(headers))]
for i,h in enumerate(headers):
columns[i].append(h)
for line in lines:
for i,l in enumerate(line_to_list(line)):
columns[i].append(l)
print columns[0]
答案 2 :(得分:2)
不是问题的答案,但我试图解决问题。 :)
from os import statvfs
with open("/proc/mounts", "r") as mounts:
split_mounts = [s.split() for s in mounts.read().splitlines()]
print "{0:24} {1:24} {2:16} {3:16} {4:15} {5:13}".format(
"FS", "Mountpoint", "Blocks", "Blocks Free", "Size", "Free")
for p in split_mounts:
stat = statvfs(p[1])
block_size = stat.f_bsize
blocks_total = stat.f_blocks
blocks_free = stat.f_bavail
size_mb = float(blocks_total * block_size) / 1024 / 1024
free_mb = float(blocks_free * block_size) / 1024 / 1024
print "{0:24} {1:24} {2:16} {3:16} {4:10.2f}MiB {5:10.2f}MiB".format(
p[0], p[1], blocks_total, blocks_free, size_mb, free_mb)
答案 3 :(得分:1)
不使用os.popen,因为它已被弃用(http://docs.python.org/library/os#os.popen)。
我已将df -h的输出放在一个文件:test.txt中,只读取该文件。但是,您也可以使用子进程读取。假设您能够读取df -h输出的每一行,以下代码将有所帮助: -
f = open('test.txt')
lines = (line.strip() for line in f.readlines())
f.close()
splittedLines = (line.split() for line in lines)
listOfColumnData = zip(*splittedLines)
for eachColumn in listOfColumnData:
print eachColumn
eachColumn将显示您想要的整个列作为列表。你可以迭代它。 如果需要,我可以提供从df -h读取输出的代码,以便您可以删除对test.txt的依赖,但是,如果您转到子流程文档,您可以找到如何轻松地完成它。
答案 4 :(得分:1)
我有一个带有空格的挂载点。这放弃了大多数例子。这从@ZarrHai的example借了很多,但把结果放在dict
#!/usr/bin/python
import subprocess
import re
from pprint import pprint
DF_OPTIONS = "-laTh" # remove h if you want bytes.
def yield_lines(data):
for line in data.split("\n"):
yield line
def line_to_list(line):
pattern = re.compile(r"([\w\/\s\-\_]+)\s+(\w+)\s+([\d\.]+?[GKM]|\d+)"
"\s+([\d\.]+[GKM]|\d+)\s+([\d\.]+[GKM]|\d+)\s+"
"(\d+%)\s+(.*)")
matches = pattern.search(line)
if matches:
return matches.groups()
_line = re.sub(r" +", " ", line).split()
return _line
p = subprocess.Popen(["df", DF_OPTIONS], stdout=subprocess.PIPE)
dfdata, _ = p.communicate()
dfdata = dfdata.replace("Mounted on", "Mounted_on")
lines = yield_lines(dfdata)
headers = line_to_list(lines.next())
columns = [list() for i in range(len(headers))]
for i,h in enumerate(headers):
columns[i].append(h)
grouped = {}
for li, line in enumerate(lines):
if not line:
continue
grouped[li] = {}
for i,l in enumerate(line_to_list(line)):
columns[i].append(l)
key = headers[i].lower().replace("%","")
grouped[li][key] = l.strip()
pprint(grouped)
答案 5 :(得分:0)
这有效:
#!/usr/bin/python
import os, re
l=[]
p=os.popen('df -h')
for line in p.readlines():
l.append(re.split(r'\s{2,}',line.strip()))
p.close()
for subl in l:
print subl
答案 6 :(得分:0)
我发现这是一种简单的方法......
df -h | awk '{print $1}'
答案 7 :(得分:0)
我在所有已访问的系统中注意到的一件事:带有选项-P的df在空白对齐的列中打印。这意味着标题与其余项目的宽度相同(用空格填充)。建立在the7erm's answer上,它使用标头的大小来确保它获得整个安装点,即使其中有空格也是如此。
这已在Ubuntu 14.04、16.04和FreeBSD 9.2上进行了测试。
我已经解决了两种不同的方法,第一种是直接回答OP的问题,它给出6列,每列以标题开头,然后在其下方依次有每个安装点:
import pprint
import subprocess
import re
DF_OPTIONS = "-PlaTh" # remove h if you want bytes.
# Get the entire output of df
dfdata = subprocess.getoutput("df " + DF_OPTIONS)
# Split it based on newlines
lines = dfdata.split("\n")
dfout = {}
headers = []
# Grab the headers, retain whitespace!
# df formats in such a way that each column header has trailing whitespace
# so the header is equal to the maximum column width. We want to retain
# this for len()
headersplit = re.split(r'(\s+)', lines[0].replace("Mounted on","Mounted_on "))
headers = [i+j for i,j in zip(headersplit[0::2],headersplit[1::2])]
for hi,head in enumerate(headers):
dfout[hi] = [head.strip()]
for line in lines[1:]:
pos = 0
dfstruct = {}
for hi,head in enumerate(headers):
# For the last item, grab the rest of the line
if head == headers[-1]:
item = line[pos:]
else:
# Get the current item
item = line[pos:pos+len(head)]
pos = pos + len(head)
#Strip whitespace and add it to the list
dfstruct[head.strip()] = item.strip()
dfout[hi].append(item.strip())
pprint.pprint(dfout)
第二个对我来说更有用,这也是为什么我一开始就偶然发现这个问题的解决方案。这会将信息放入一系列字典中:
import pprint
import subprocess
import re
DF_OPTIONS = "-PlaTh" # remove h if you want bytes.
# Get the entire output of df
dfdata = subprocess.getoutput("df " + DF_OPTIONS)
# Split it based on newlines
lines = dfdata.split("\n")
dfout = []
headers = []
# Grab the headers, retain whitespace!
# df formats in such a way that each column header has trailing whitespace
# so the header is equal to the maximum column width. We want to retain
# this for len()
headersplit = re.split(r'(\s+)', lines[0].replace("Mounted on","Mounted_on "))
headers = [i+j for i,j in zip(headersplit[0::2],headersplit[1::2])]
for line in lines[1:]:
pos = 0
dfstruct = {}
for head in headers:
# For the last item, grab the rest of the line
if head == headers[-1]:
item = line[pos:]
else:
# Get the current item
item = line[pos:pos+len(head)]
pos = pos + len(head)
#Strip whitespace for our own structure
dfstruct[head.strip()] = item.strip()
dfout.append(dfstruct)
pprint.pprint(dfout)