捕获特定的文本字符串并从XML中删除文本字符串

时间:2012-10-04 00:20:20

标签: coldfusion coldfusion-9

我想弄清楚如何只显示|之间的单词和|在下面的XML的描述中。例如,单词“web”或“VideoGames”。

我试过这个,但因为每个单词都有不同的长度,所以我不成功。另外,我无法摆脱|

<cfoutput>#Right(thefeed2.rsschannel.eachresult.resultnumber[x].metadata.xmlAttributes.v, 10)#</cfoutput>

我也在尝试做相反的任务 - 过滤掉|之间的单词和|所以它不显示。换句话说,以第一项为例,显示整个描述,减去“web”或“VideoGames”这个词

我已经尝试了这一点,但同样,我遇到的问题与尝试过滤掉描述一样,在两者之间没有这个词。和|。

<cfoutput>#left(thefeed2.rsschannel.eachresult.resultnumber[x].metadata.xmlAttributes.v, 500)#</cfoutput>

所以我的问题是......

1:我如何在|之间提取单词和|来自description元素。

2:在1号的单独实例中,如何删除|之间的单词和|在描述中?

顺便说一句,“thefeed2”就是我所说的XML feed。

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<rsschannel>
<resultsnumbertotal>11    </resultsnumbertotal>
<eachresult>
<resultnumber N="1">
<U>/file.cfm?id=yahoocom    </U>
<T>Yahoo    </T>
<metadata N="description" V="Quickly find what you're searching for, get in touch with friends and stay in-the-know |Web|"/>
</resultnumber>

<resultnumber N="2">
<U>/file.cfm?id=halo    </U>
<T>Halo    </T>
<metadata N="description" V="Halo is a multi-billion dollar science fiction video game franchise created by Bungie and now managed by 343 Industries and owned by Microsoft Studios. |VideoGames|"/>
</resultnumber>

<resultnumber N="3">
<U>/file.cfm?id=bingcom    </U>
<T>Bing    </T>
<metadata N="description" V="Bing is a search engine that brings together the best of search and people in your social networks to help you spend less time searching and more time doing. |Web|"/>
</resultnumber>

<resultnumber N="4">
<U>/file.cfm?id=lal    </U>
<T>Lakers    </T>
<metadata N="description" V="The Los Angeles Lakers are an American professional basketball team based in Los Angeles, California. They play in the Pacific Division |Sports|"/>
</resultnumber>

<resultnumber N="5">
<U>/file.cfm?id=quick    </U>
<T>Stay in the Know    </T>
<metadata N="description" V="Quickly find what you're searching for, get in touch with friends and stay in-the-know |Misc|"/>
</resultnumber>

<resultnumber N="6">
<U>/file.cfm?id=multi    </U>
<T>Billion Dollars    </T>
<metadata N="description" V="Halo is a multi-billion dollar science fiction video game franchise created by Bungie and now managed by 343 Industries and owned by Microsoft Studios. |Misc|"  />
</resultnumber>

<resultnumber N="7">
<U>/file.cfm?id=searching    </U>
<T>Searches    </T>
<metadata N="description" V="Bing is a search engine that brings together the best of search and people in your social networks to help you spend less time searching and more time doing. |Web|" />
</resultnumber>

<resultnumber N="8">
<U>/file.cfm?id=LosAngeles    </U>
<T>Los Angeles    </T>
<metadata N="description" V="The Los Angeles Lakers are an American professional basketball team based in Los Angeles, California. They play in the Pacific Division |Sports|"/>
</resultnumber>

<resultnumber N="9">
<U>/file.cfm?id=quick    </U>
<T>Stay in the Know    </T>
<metadata N="description" V="Quickly find what you're searching for, get in touch with friends and stay in-the-know |Misc|"/>
</resultnumber>

<resultnumber N="10">
<U>/file.cfm?id=LosAngeles    </U>
<T>Los Angeles    </T>
<metadata N="description" V="The Los Angeles Lakers are an American professional basketball team based in Los Angeles, California. They play in the Pacific Division"/>
</resultnumber>

<resultnumber N="11">
<U>/file.cfm?id=quick    </U>
<T>Stay in the Know    </T>
<metadata N="description" V="Quickly find what you're searching for, get in touch with friends and stay in-the-know |SummaryofDescription|"/>
</resultnumber>

</eachresult>
</rsschannel>

3 个答案:

答案 0 :(得分:1)

总会只有两个|符号吗?如果是这样,您可以使用GetToken来查找值

<cfset Mystring = GetToken(variable,1,'|') />

GetToken将您的变量视为一个分隔符为|

的列表

要删除该字词,您可以使用replace(variable,Mystring,'')

答案 1 :(得分:1)

您希望将regex与REreplace()一起使用。示例(在cfscript中):

1)

origString = thefeed2.rsschannel.eachresult.resultnumber[x].metadata.xmlAttributes.v;
newString = REreplace(origString, "\|.*\|", "new text");

2)

origString = thefeed2.rsschannel.eachresult.resultnumber[x].metadata.xmlAttributes.v;
newString = REreplace(origString, "\|.*\|", "");

正则表达式\|.*\|将匹配以|。

开头和结尾的任何文本

答案 2 :(得分:1)

这个怎么样?对于字符串操作,总是需要退一步寻找模式以帮助您获得创意。例如,CFML的“列表”概念使得一些有趣的时间成为可能。更高级的用户将始终直接使用正则表达式,但如果您刚刚开始,这绝对是一个高级主题。我立即将您的字符串识别为由管道(|)分隔的列表。您还可以看到句子是由空格分隔的单词列表。这是一些代码。

 //set your string to something
 <cfset myString = "thefeed2.rsschannel.eachresult.resultnumber[x].metadata.xmlAttributes.v" />
//now let's treat the phrase as a list, we'll get position 2 of the list.
<cfset myWord = listGetAt(myString,2,'|');
//assuming you don't want the pipes in the clean string, let's just do a fast replace once
<cfset cleanString = replace(myString,"|#myWord#|","") />
//if you do want the pipes then you'll need a regex (a more advanced topic)
<cfset cleanStringWithPipes = rereplace(myString,"(\|)[ A-Za-z0-9]+(\|)","\1\2") />