我正在做一个混合语言脚本,父脚本是bash(不要问为什么,这是一个很长的故事)。我的部分脚本将XML页面的源代码转换为变量。我想使用bash将变量中的XML处理成几个数组。 XML的设置如下:
<event>
<id>34287352</id>
<what>New Post</what>
<when>1 Minute Ago 03:50 PM</when>
<title>This is a title</title>
<preview>sdfasd</preview>
<poster>
<![CDATA[ USERNAME ]]>
</poster>
<threadid>2346566</threadid>
<postid>34287352</postid>
<lastpost>1360021837</lastpost>
<userid>3291696</userid>
<forumid>2</forumid>
<forumname>General Discussion</forumname>
<views>201,913</views>
<replies>6,709</replies>
<statusicon>images/statusicon/thread.gif</statusicon>
</event>
XML文件中有20 <event>
个。我想从XML中提取标题和预览,并将它们全部放入自己的数组中
我在SOF上关注了一个例子
for tag in what title preview
do
OUT=`grep $tag $source | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/' `
# This is what I call the eval_trick, difficult to explain in words.
eval ${tag}=`echo -ne \""${OUT}"\"`
done
W_ARRAY=( `echo ${what}` )
T_ARRAY=( `echo ${title}` )
P_ARRAY=( `echo ${preview}` )
echo ${W_ARRAY[0]}
echo ${T_ARRAY[0]}
echo ${P_ARRAY[0]}
但是使用上面的我的脚本总是吓坏了并重复grep: <part of the xml>: No such file or directory
思想?
编辑:
嗯,这很丑,但我设法把sudoxml变成了一个数组
windex=0
tindex=0
pindex=0
while read -r line
do
WHAT=$(echo ${line} | awk -F "</?what>" '{ print $2 }')
if [ "$WHAT" != "" ]; then
W_ARRAY[$windex]=$OUT
let windex+=1
fi
TITLE=$(echo ${line} | awk -F "</?title>" '{ print $2 }')
if [ "$TITLE" != "" ]; then
T_ARRAY[$tindex]=$OUT
let tindex+=1
fi
PREVIEW=$(echo ${line} | awk -F "</?preview>" '{ print $2 }')
if [ "$PREVIEW" != "" ]; then
P_ARRAY[$pindex]=$OUT
let pindex+=1
fi
done <<< "$source"
答案 0 :(得分:1)
我有类似的东西,解析明智,这是一个黑客版本
我使用xsltproc(在ubuntu中,但是不记得我是否已经专门安装了它)
命令行
xsltproc tfile.xslt tfile.xml
tfile.xml(是你的例子被复制了3次),包含在事件标签即。
中<events>
<event> ... </event>
<event> ... </event>
<event> ... </event>
</events>
tfile.xsl:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method='text'/>
<!-- ================================================================== -->
<xsl:template match="/">
<xsl:apply-templates select="//event"/>
</xsl:template>
<xsl:template match="event">
<xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['id']=</xsl:text>
<xsl:value-of select="id"/> <xsl:text> </xsl:text>
<xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['what']=</xsl:text>
<xsl:value-of select="what"/><xsl:text> </xsl:text>
<xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['preview']=</xsl:text>
<xsl:value-of select="preview"/><xsl:text> </xsl:text>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
输出
event[1]['id']=34287352 event[1]['what']=New Post event[1]['preview']=sdfasd
event[2]['id']=34287353 event[2]['what']=New Post3 event[2]['preview']=sdfasd
event[3]['id']=34287354 event[3]['what']=New Post4 event[3]['preview']=sdfasd
希望您知道一些xslt处理,根据需要更改输出。
答案 1 :(得分:0)
好吧,现在这完全没用,但我目前正在使用命令行xml解析器。如果它已经完成(它已经是,如果我没有被topcoder马拉松游行分心......),你可以简单地写它:
eval $(echo "$source" | xidel - -e '<event>
<what>{$W_ARRAY}</what>
<title>{$T_ARRAY}</title>
<preview>{$P_ARRAY}</preview>
</event>*' --output-format bash)
看起来很神奇,不是吗?
答案 2 :(得分:0)
回顾一下我的评论,这就是你的代码出了什么问题:
1-由于您的$source
变量不是文件名,因此在您的grep中应使用:
OUT=`echo $source | grep $tag | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/' `
2-您的tr
命令替换了类似XML的变量中的所有选项卡。但是,rour变量不包含 tab ,而是包含4个空格。
所以你需要:
... | tr -d ' ' | ...
3-另一种解决方案是:
OUT=`echo $source | grep $tag | sed 's/<.*>\([^<].*\)<.*>$/\1/' `
(请注意^
中的sed
已删除)
答案 3 :(得分:0)
一切正常。对于那些曾经打算做任何类似事情的人来说,这就是讨厌的人:
on run argv
set region to item 1 of argv
set XML_URL to "http://" & region & ".<URL REMOVED>.com/board/vaispy-secret.php?do=xml"
try
tell application "Safari"
set URL of tab 1 of front window to XML_URL
my waitforload()
--delay 5
-- Get page source
set currentTab to current tab of front window
set currentSource to currentTab's source
return currentSource
end tell
on error err
log "Could not retrieve source."
log err
display dialog err
--return "NULL"
end try
end run
on waitforload()
--check if page has loaded
local loadflag, zarg, test_html
set loadflag to 0
repeat until loadflag is 1
delay 0.5
tell application "Safari"
set test_html to source of document 1
end tell
try
set zarg to text ((count of characters in test_html) - 10) thru (count of characters in test_html) of test_html
if "</events>" is in text ((count of characters in test_html) - 10) thru (count of characters in test_html) of test_html then
set loadflag to 1
end if
end try
end repeat
end waitforload
创建bash脚本:
#!/bin/bash
clear
if [ "$1" == "na" ]; then
region="na"
elif [ "$1" == "eu" ]; then
region="euw"
else
echo "FRcli requires an argument."
echo "usage: [eu|na]"
echo "[eu scans EUW & EUNE]"
echo "[na scans NA]"
exit $?
fi
while true; do
clear
echo "Region: $region"
echo "...Importing Naughty"
declare -a NAUGHTY=()
nindex=0
while read line
do
NAUGHTY[$nindex]=$line
let nindex+=1
done < $HOME/Desktop/naughty.txt
NC=${#NAUGHTY[@]}
let NC-=1
echo "...Pulling Source"
source=$(osascript FRcli.scpt $region)
echo "...Extracting Arrays"
windex=0
tindex=0
pindex=0
dindex=0
while read -r line
do
#WHAT=$(echo ${line} | awk -F "</?what>" '{ print $2 }')
WHAT=$(echo ${line} | sed -n 's/^.*<what>\([^<]*\).*/\1/p')
if [ "$WHAT" != "" ]; then
W_ARRAY[$windex]=$WHAT
let windex+=1
fi
#TITLE=$(echo ${line} | awk -F "</?title>" '{ print $2 }')
TITLE=$(echo ${line} | sed -n 's/^.*<title>\([^<]*\).*/\1/p')
if [ "$TITLE" != "" ]; then
T_ARRAY[$tindex]=$TITLE
let tindex+=1
fi
#PREVIEW=$(echo ${line} | awk -F "</?preview>" '{ print $2 }')
#PREVIEW=$(echo ${line} | sed -n '/<preview*/,/<\/preview>/p')
PREVIEW=$(echo ${line} | sed -n 's/^.*<preview>\([^<]*\).*/\1/p')
if [ "$PREVIEW" != "" ]; then
P_ARRAY[$pindex]=$PREVIEW
let pindex+=1
fi
POSTID=$(echo ${line} | sed -n 's/^.*<postid>\([^<]*\).*/\1/p')
if [ "$POSTID" != "" ]; then
D_ARRAY[$dindex]=$POSTID
let dindex+=1
fi
done <<< "$source"
echo "What: ${#W_ARRAY[@]}"
echo "Title: ${#T_ARRAY[@]}"
echo "Preview: ${#P_ARRAY[@]}"
echo "PostID: ${#D_ARRAY[@]}"
for ((i=0; i <= 19; i++))
do
found=0
fpid=""
if [ "${W_ARRAY[$i]}" = "New Thread" ]; then
echo "Scanning Thread"
scan=$(echo ${T_ARRAY[$i]} ${P_ARRAY[$i]})
echo "Title: ${T_ARRAY[$i]}"
echo "Post: ${P_ARRAY[$i]}"
else
echo "Scanning Post"
scan=$(echo ${P_ARRAY[$i]})
echo "Post: ${scan}"
fi
sleep .5
for ((n=0; n<=$NC; n++))
do
nw=${NAUGHTY[$n]}
a=$(echo ${scan} | tr [:lower:] [:upper:])
b=$(echo ${nw} | tr [:lower:] [:upper:])
echo "Checking: $b"
#echo "$a"
if [[ $a == *$b* ]]; then
## Change != to == in release
echo "Found: $b"
found=1
echo "...Loading PID"
declare -a PID=()
pindex=0
while read line
do
PID[$pindex]=$line
let pindex+=1
done < $HOME/Desktop/pid.txt
PIDC=${#PID[@]}
for (( p=0; p<=$PIDC ; p++))
do
lpid=${PID[$p]}
if [ "$region ${D_ARRAY[$i]}" == "$lpid" ]; then
echo "Found: $lpid"
echo "Ignoring Flag"
fpid=1
elif [ "$region ${D_ARRAY[$i]}" != "$lpid" ]; then
echo "$region ${D_ARRAY[$i]} $lpid"
echo "PID not found, opening URL."
fpid=0
break
else
echo "Hi"
fpid=1
fi
done
if [ "$found" == "1" -a "$fpid" == "0" ]; then
FFURL="http://$region.<URL REMOVED>.com/board/showthread.php?p=${D_ARRAY[$i]}&highlight=$nw"
open -a Firefox "$FFURL"
echo $region ${D_ARRAY[$i]} >> $HOME/Desktop/pid.txt
found=0
fipd=""
fi
fi
done
sleep .5
done
if [ "$1" == "eu" ]; then
if [ "$region" == "euw" ]; then
region="eune"
else
region="euw"
fi
fi
clear
完成 我相信他们这样做的效率要高得多。在bash脚本中使用cURL会使这成为一次脚本交易(由于此板iSpy的安全性,无法使用此脚本)。但这很有效,而且非常活泼。仅使用AVG 32.7 Mem,据我所知,没有任何内存泄漏(就像我的100%Applecript版本一样)