我是Perl的新手,我目前正在尝试将bash脚本转换为perl。
我的脚本用于转换nmon文件(AIX / Linux perf监视工具),它将nmon文件存在于目录中,grep并将特定部分重定向到临时文件,grep并将关联的时间戳重定向到另一个文件。
然后,它将数据解析为最终的csv文件,该文件将被第三个被利用的工具编入索引。
示例NMON数据如下所示:
TOP,%CPU Utilisation
TOP,+PID,Time,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass
TOP,5165226,T0002,10.93,9.98,0.95,1,54852,4232,51220,311014,0.755,1264,PatrolAgent,Unclassified
TOP,5365876,T0002,1.48,0.81,0.67,135,85032,132,84928,38165,1.159,0,db2sysc,Unclassified
TOP,5460056,T0002,0.32,0.27,0.05,1,5060,616,4704,1719,0.072,0,db2kmchan64.v9,Unclassified
字段“Time”(在TON2中看起来在NMON中真的称为ZZZZ)是一个特定的NMON时间戳,此时间戳的实际值稍后(在专用部分中)出现在NMON文件中,如下所示:
ZZZZ,T0001,00:09:55,01-JAN-2014
ZZZZ,T0002,00:13:55,01-JAN-2014
ZZZZ,T0003,00:17:55,01-JAN-2014
ZZZZ,T0004,00:21:55,01-JAN-2014
ZZZZ,T0005,00:25:55,01-JAN-2014
NMON格式非常具体,无法在不进行解析的情况下直接利用,时间戳必须与相应的值相关联。 (NMON文件几乎就像是许多不同csv文件的串联,每个文件都有不同的格式,不同的文件等等。)
我编写了以下bash脚本来解析我感兴趣的部分(“TOP”部分代表顶级进程cpu,mem,每个主机的io统计数据)
#!/bin/bash
# set -x
################################################################
# INFORMATION
################################################################
# nmon2csv_TOP.sh
# Convert TOP section of nmon files to csv
# CAUTION: This script is expected to be launched by the main workflow
# $DST and DST_CONVERTED_TOP are being exported by it, if not this script will exit at launch time
################################################################
# VARS
################################################################
# Location of NMON files
NMON_DIR=${DST}
# Location of generated files
OUTPUT_DIR=${DST_CONVERTED_TOP}
# Temp files
rawdatafile=/tmp/temp_rawdata.$$.temp
timestampfile=/tmp/temp_timestamp.$$.temp
# Main Output file
finalfile=${DST_CONVERTED_TOP}/NMON_TOP_processed_at_date_`date '+%F'`.csv
###########################
# BEGIN OF WORK
###########################
# Verify exported var are not null
if [ -z ${NMON_DIR} ]; then
echo -e "\nERROR: Var NMON_DIR is null!\n" && exit 1
elif [ -z ${OUTPUT_DIR} ]; then
echo -e "\nERROR: Var OUTPUT_DIR is null!\n" && exit 1
fi
# Check if temp and output files already exists
if [ -s ${rawdatafile} ]; then
rm -f ${rawdatafile}
elif [ -s ${timestampfile} ]; then
rm -f ${timestampfile}
elif [ -s ${finalfile} ]; then
rm -f ${finalfile}
fi
# Get current location
PWD=`pwd`
# Go to NMON files location
cd ${NMON_DIR}
# For each NMON file present:
# To restrict to only PROD env: `ls *.nmon | grep -E -i 'sp|gp|ge'`
for NMON_FILE in `ls *.nmon | grep -E -i 'sp|gp|ge'`; do
# Set Hostname identification
serialnum=`grep 'AAA,SerialNumber,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
hostname=`grep 'AAA,host,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
# Grep and redirect TOP Section
grep 'TOP' ${NMON_FILE} | grep -v 'AAA,version,TOPAS-NMON' | grep -v 'TOP,%CPU Utilisation' > ${rawdatafile}
# Grep and redirect associated timestamps (ZZZZ)
grep 'ZZZZ' ${NMON_FILE}> ${timestampfile}
# Begin of work
while IFS=, read TOP PID Time Pct_CPU Pct_Usr Pct_Sys Threads Size ResText ResData CharIO Pct_RAM Paging Command WLMclass
do
timestamp=`grep ${Time} ${timestampfile} | awk -F, '{print $4 " "$3}' OFS=,`
echo ${serialnum},${hostname},${timestamp},${Time},${PID},${Pct_CPU},${Pct_Usr},${Pct_Sys},${Threads},${Size},${ResText},${ResData},${CharIO},${Pct_RAM},${Paging},${Command},${WLMclass} \
| grep -v '+PID,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass' >> ${finalfile}
done < ${rawdatafile}
echo -e "INFO: Done for Serialnum: ${serialnum} Hostname: ${hostname}"
done
# Go back to initial location
cd ${PWD}
###########################
# END OF WORK
###########################
这可以按原样工作并生成一个主csv文件(你会在代码中看到我自愿不在文件中保留csv头),这是所有已解析主机的串联。
但是,我有一个非常大量的主机要处理每天(大约3000个主机),使用这个当前代码,在最坏的情况下,它可能需要几分钟来生成1个主机的数据,每个主机数量多个分钟变得很容易...... ...
所以,这段代码真的不足以处理这么多数据
10个主机代表大约200,000行,最终代表大约20 MB的csv文件。 那不是那么多,但我认为shell脚本可能不是管理这样一个过程的更好选择......
我想perl在这个任务上要好得多(即使shell脚本可能会得到改进),但我对perl的了解(目前)非常差,这就是为什么我会请求你的帮助...我认为这个代码在perl中应该很简单,但我现在不能让它工作......
一个人用来开发一个perl脚本来管理NMON文件并将它们转换为sql文件(将这些数据转储到数据库中),我上演它以使用它的功能并借助一些shell脚本我管理sql文件来获取我的最终csv文件。
但是TOP部分没有集成到perl脚本中,如果没有重新开发就无法使用它。
有问题的代码:
#!/usr/bin/perl
# Program name: nmon2mysql.pl
# Purpose - convert nmon.csv file(s) into mysql insert file
# Author - Bruce Spencer
# Disclaimer: this provided "as is".
# Date - March 2007
#
$nmon2mysql_ver="1.0. March 2007";
use Time::Local;
#################################################
## Your Customizations Go Here ##
#################################################
# Source directory for nmon csv files
my $NMON_DIR=$ENV{DST_TMP};
my $OUTPUT_DIR=$ENV{DST_CONVERTED_CPU_ALL};
# End "Your Customizations Go Here".
# You're on your own, if you change anything beyond this line :-)
####################################################################
############# Main Program ############
####################################################################
# Initialize common variables
&initialize;
# Process all "nmon" files located in the $NMON_DIR
# @nmon_files=`ls $NMON_DIR/*.nmon $NMON_DIR/*.csv`;
@nmon_files=`ls $NMON_DIR/*.nmon`;
if (@nmon_files eq 0 ) { die ("No \*.nmon or csv files found in $NMON_DIR\n"); }
@nmon_files=sort(@nmon_files);
chomp(@nmon_files);
foreach $FILENAME ( @nmon_files ) {
@cols= split(/\//,$FILENAME);
$BASEFILENAME= $cols[@cols-1];
unless (open(INSERT, ">$OUTPUT_DIR/$BASEFILENAME.sql")) {
die("Can not open /$OUTPUT_DIR/$BASEFILENAME.sql\n");
}
print INSERT ("# nmon version: $NMONVER\n");
print INSERT ("# AIX version: $AIXVER\n");
print INSERT ("use nmon;\n");
$start=time();
@now=localtime($start);
$now=join(":",@now[2,1,0]);
print ("$now: Begin processing file = $FILENAME\n");
# Parse nmon file, skip if unsuccessful
if (( &get_nmon_data ) gt 0 ) { next; }
$now=time();
$now=$now-$start;
print ("\t$now: Finished get_nmon_data\n");
# Static variables (number of fields always the same)
#@static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE","PROC");
#@static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE");
@static_vars=("CPU_ALL");
foreach $key (@static_vars) {
&mk_mysql_insert_static($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
} # end foreach
# Dynamic variables (variable number of fields)
#@dynamic_vars=("DISKBSIZE","DISKBUSY","DISKREAD","DISKWRITE","DISKXFER","ESSREAD","ESSWRITE","ESSXFER","IOADAPT","NETERROR","NET","NETPACKET");
@dynamic_vars=("");
foreach $key (@dynamic_vars) {
&mk_mysql_insert_variable($key);;
$now=time();
$now=$now-$start;
print ("\t$now: Finished $key\n");
}
close(INSERT);
# system("gzip","$FILENAME");
}
exit(0);
############################################
############# Subroutines ############
############################################
##################################################################
## Extract CPU_ALL data for Static fields
##################################################################
sub mk_mysql_insert_static {
my($nmon_var)=@_;
my $table=lc($nmon_var);
my @rawdata;
my $x;
my @cols;
my $comma;
my $TS;
my $n;
@rawdata=grep(/^$nmon_var,/, @nmon);
if (@rawdata < 1) { return(1); }
@rawdata=sort(@rawdata);
@cols=split(/,/,$rawdata[0]);
$x=join(",",@cols[2..@cols-1]);
$x=~ s/\%/_PCT/g;
$x=~ s/\(MB\)/_MB/g;
$x=~ s/-/_/g;
$x=~ s/ /_/g;
$x=~ s/__/_/g;
$x=~ s/,_/,/g;
$x=~ s/_,/,/g;
$x=~ s/^_//;
$x=~ s/_$//;
print INSERT (qq|insert into $table (serialnum,hostname,mode,nmonver,time,ZZZZ,$x) values\n| );
$comma="";
$n=@cols;
$n=$n-1; # number of columns -1
for($i=1;$i<@rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
@cols=split(/,/,$rawdata[$i]);
$x=join(",",@cols[2..$n]);
$x=~ s/,,/,-1,/g; # replace missing data ",," with a ",-1,"
print INSERT (qq|$comma("$SN","$HOSTNAME","$MODE","$NMONVER",$TS,"$DATETIME{@cols[1]}",$x)| );
$comma=",\n";
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert
##################################################################
## Extract CPU_ALL data for variable fields
##################################################################
sub mk_mysql_insert_variable {
my($nmon_var)=@_;
my $table=lc($nmon_var);
my @rawdata;
my $x;
my $j;
my @cols;
my $comma;
my $TS;
my $n;
my @devices;
@rawdata=grep(/^$nmon_var,/, @nmon);
if ( @rawdata < 1) { return; }
@rawdata=sort(@rawdata);
$rawdata[0]=~ s/\%/_PCT/g;
$rawdata[0]=~ s/\(/_/g;
$rawdata[0]=~ s/\)/_/g;
$rawdata[0]=~ s/ /_/g;
$rawdata[0]=~ s/__/_/g;
$rawdata[0]=~ s/,_/,/g;
@devices=split(/,/,$rawdata[0]);
print INSERT (qq|insert into $table (serialnum,hostname,time,ZZZZ,device,value) values\n| );
$n=@rawdata;
$n--;
for($i=1;$i<@rawdata;$i++){
$TS=$UTC_START + $INTERVAL*($i);
$rawdata[$i]=~ s/,$//;
@cols=split(/,/,$rawdata[$i]);
print INSERT (qq|\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[2]",$cols[2])| );
for($j=3;$j<@cols;$j++){
print INSERT (qq|,\n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[$j]",$cols[$j])| );
}
if ($i < $n) { print INSERT (","); }
}
print INSERT (qq|;\n\n|);
} # end mk_mysql_insert_variable
########################################################
### Get an nmon setting from csv file ###
### finds first occurance of $search ###
### Return the selected column...$return_col ###
### Syntax: ###
### get_setting($search,$col_to_return,$separator)##
########################################################
sub get_setting {
my $i;
my $value="-1";
my ($search,$col,$separator)= @_; # search text, $col, $separator
for ($i=0; $i<@nmon; $i++){
if ($nmon[$i] =~ /$search/ ) {
$value=(split(/$separator/,$nmon[$i]))[$col];
$value =~ s/["']*//g; #remove non alphanum characters
return($value);
} # end if
} # end for
return($value);
} # end get_setting
#####################
## Clean up ##
#####################
sub clean_up_line {
# remove characters not compatible with nmon variable
# Max rrdtool variable length is 19 chars
# Variable can not contain special characters (% - () )
my ($x)=@_;
# print ("clean_up, before: $i\t$nmon[$i]\n");
$x =~ s/\%/Pct/g;
# $x =~ s/\W*//g;
$x =~ s/\/s/ps/g; # /s - ps
$x =~ s/\//s/g; # / - s
$x =~ s/\(/_/g;
$x =~ s/\)/_/g;
$x =~ s/ /_/g;
$x =~ s/-/_/g;
$x =~ s/_KBps//g;
$x =~ s/_tps//g;
$x =~ s/[:,]*\s*$//;
$retval=$x;
} # end clean up
##########################################
## Extract headings from nmon csv file ##
##########################################
sub initialize {
%MONTH2NUMBER = ("jan", 1, "feb",2, "mar",3, "apr",4, "may",5, "jun",6, "jul",7, "aug",8, "sep",9, "oct",10, "nov",11, "dec",12 );
@MONTH2ALPHA = ( "junk","jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec" );
} # end initialize
# Get data from nmon file, extract specific data fields (hostname, date, ...)
sub get_nmon_data {
my $key;
my $x;
my $category;
my %toc;
my @cols;
# Read nmon file
unless (open(FILE, $FILENAME)) { return(1); }
@nmon=<FILE>; # input entire file
close(FILE);
chomp(@nmon);
# Cleanup nmon data remove trainig commas and colons
for($i=0; $i<@nmon;$i++ ) {
$nmon[$i] =~ s/[:,]*\s*$//;
}
# Get nmon/server settings (search string, return column, delimiter)
$AIXVER =&get_setting("AIX",2,",");
$DATE =&get_setting("date",2,",");
$HOSTNAME =&get_setting("host",2,",");
$INTERVAL =&get_setting("interval",2,","); # nmon sampling interval
$MEMORY =&get_setting(qq|lsconf,"Good Memory Size:|,1,":");
$MODEL =&get_setting("modelname",3,'\s+');
$NMONVER =&get_setting("version",2,",");
$SNAPSHOTS =&get_setting("snapshots",2,","); # number of readings
$STARTTIME =&get_setting("AAA,time",2,",");
($HR, $MIN)=split(/\:/,$STARTTIME);
if ($AIXVER eq "-1") {
$SN=$HOSTNAME; # Probably a Linux host
} else {
$SN =&get_setting("systemid",4,",");
$SN =(split(/\s+/,$SN))[0]; # "systemid IBM,SN ..."
}
$TYPE =&get_setting("^BBBP.*Type",3,",");
if ( $TYPE =~ /Shared/ ) { $TYPE="SPLPAR"; } else { $TYPE="Dedicated"; }
$MODE =&get_setting("^BBBP.*Mode",3,",");
$MODE =(split(/: /, $MODE))[1];
# $MODE =~s/\"//g;
# Calculate UTC time (seconds since 1970)
# NMON V9 dd/mm/yy
# NMON V10+ dd-MMM-yyyy
if ( $DATE =~ /[a-zA-Z]/ ) { # Alpha = assume dd-MMM-yyyy date format
($DAY, $MMM, $YR)=split(/\-/,$DATE);
$MMM=lc($MMM);
$MON=$MONTH2NUMBER{$MMM};
} else {
($DAY, $MON, $YR)=split(/\//,$DATE);
$YR=$YR + 2000;
$MMM=$MONTH2ALPHA[$MON];
} # end if
## Calculate UTC time (seconds since 1970). Required format for the rrdtool.
## timelocal format
## day=1-31
## month=0-11
## year = x -1900 (time since 1900) (seems to work with either 2006 or 106)
$m=$MON - 1; # jan=0, feb=2, ...
$UTC_START=timelocal(0,$MIN,$HR,$DAY,$m,$YR);
$UTC_END=$UTC_START + $INTERVAL * $SNAPSHOTS;
@ZZZZ=grep(/^ZZZZ,/,@nmon);
for ($i=0;$i<@ZZZZ;$i++){
@cols=split(/,/,$ZZZZ[$i]);
($DAY,$MON,$YR)=split(/-/,$cols[3]);
$MON=lc($MON);
$MON="00" . $MONTH2NUMBER{$MON};
$MON=substr($MON,-2,2);
$ZZZZ[$i]="$YR-$MON-$DAY $cols[2]";
$DATETIME{$cols[1]}="$YR-$MON-$DAY $cols[2]";
} # end ZZZZ
return(0);
} # end get_nmon_data
它几乎(我说几乎是因为最近的NMON版本,当没有数据存在时它有时会有一些问题)完成这项工作,如果我将它用于这些部分它会更快地执行我的shell脚本
这就是为什么我认为perl应该是一个完美的解决方案。
当然,我不会要求任何人将我的shell脚本转换为perl中的最终版本,但至少要让我指向正确的方向: - )
我真的非常感谢任何人的帮助!
答案 0 :(得分:0)
通常情况下,我强烈反对这样的问题,但是我们的生产系统已经关闭,直到它们被修复,我真的没那么多事情要做......
以下是一些可能让您入门的代码。请考虑它是伪代码,因为它是完全未经测试的,可能甚至不会编译(我总是忘记一些parantheses或分号,正如我所说,可以运行代码的实际机器无法访问)但我评论了很多,希望你会成为能够根据您的实际需要对其进行修改并使其运行。
use strict;
use warnings;
open INFILE, "<", "path/to/file.nmon"; # Open the file.
my @topLines; # Initialize variables.
my %timestamps;
while <INFILE> # This will walk over all the lines of the infile.
{ # Storing the current line in $_.
chomp $_; # Remove newline at the end.
if ($_ =~ m/^TOP/) # If the line starts with TOP...
{
push @topLines, $_; # ...store it in the array for later use.
}
elsif ($_ =~ m/^ZZZZ/) # If it is in the ZZZZ section...
{
my @fields = split ',', $_; # ...split the line at commas...
my $timestamp = join ",", $fields(2), $fields(3); # ...join the timestamp into a string as you wish...
$timestamps{$fields(1)} = $timestamp; # ...and store it in the hash with the Twhatever thing as key.
}
# This iteration could certainly be improved with more knowledge
# of how the file looks. For example the search could be cancelled
# after the ZZZZ section if the file is still long.
}
close INFILE;
open OUTFILE, ">", "path/to/output.csv"; # Open the file you want your output in.
foreach (@topLines) # Iterate through all elements of the array.
{ # Once again storing the current value in $_.
my @fields = split ',', $_; # Probably not necessary, depending on how output should be formated.
my $outstring = join ',', $fields(0), $fields(1), $timestamps{$fields(2)}; # And whatever other fields you care for.
print OUTFILE $outstring, "\n"; # Print.
}
close OUTFILE;
print "Done.\n";