我正在强迫自己学习如何仅使用AppleScript编写脚本,但我目前面临的问题是尝试使用类删除特定标记。我试图找到可靠的文档和示例,但此时它似乎非常有限。
以下是我的HTML:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
我要做的是删除一个特定的类,因此它会删除<span class="foo">
,结果:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
我知道如何使用do shell script
和终端进行此操作,但我想了解AppleScript字典中可用的内容。
在研究中,我能够找到一种方法来解析所有HTML标签:
on removeMarkupFromText(theText)
set tagDetected to false
set theCleanText to ""
repeat with a from 1 to length of theText
set theCurrentCharacter to character a of theText
if theCurrentCharacter is "<" then
set tagDetected to true
else if theCurrentCharacter is ">" then
set tagDetected to false
else if tagDetected is false then
set theCleanText to theCleanText & theCurrentCharacter as string
end if
end repeat
return theCleanText
end removeMarkupFromText
但删除了所有HTML标记,这不是我想要的。搜索SO我能够找到如何使用Parsing HTML source code using AppleScript在标签之间进行提取,但我不打算解析文件。
我熟悉下拉列表中的BBEdit Balance Tags
Balance
,但是当我跑步时:
tell application "BBEdit"
activate
find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
balance tags
end tell
它变得贪婪并抓住第一个标签与第二个最后一个结束标签之间的整条线,中间有文本,而不是将其自身隔离到带有文本的第一个标签。
tag
下的字典中的进一步研究我确实遇到了find tag
我可以做的事情:set spanTarget to (find tag "span" start_offset counter)
然后使用类|class| of attributes of tag of spanTarget
定位标记并使用{{1}但我仍然遇到和以前一样的问题。
所以在纯 AppleScript中,如何在不贪婪的情况下删除与某个类关联的标记?
答案 0 :(得分:1)
您可以在{strong> BBEdit 或 TextWrangler 的find
命令中使用正则表达式:
要选择标记(非贪婪),请使用以下命令:
find "<span class=\"foo\">.+?</span>" searching in text 1 of text document 1 options {search mode:grep, wrap around:true} with selecting match
来自.+?</span>
模式的信息:
.
匹配任何字符(换行符除外)+
表示任意字符的一次或多次重复?
表示非贪婪量词span
标记,然后是一个或多个出现的除返回之外的任何字符,后跟一个结束span
标记,非贪婪的量词实现了我们的结果想要阻止BBEdit超越结束</span>
标签并匹配多个标签。要在换行符中匹配模式,只需将(?s)
放在模式的开头,如下所示:
find "(?s)<span class=\"foo\">.+?</span>" searching in text 1 of text document 1 options {search mode:grep, wrap around:true} with selecting match
<span class="foo">shoulder</span>
<span class="foo">shoulder
</span>
<span class="foo">shoulder
xxxx
yyyy
zzzz</span>
从AppleScript中,您可以使用替换命令( BBEdit 或 TextWrangler )查找模式并删除所有匹配的字符串,例如此
replace "(?s)<span class=\"foo\">.+?</span>" using "" searching in text 1 of text document 1 options {search mode:grep, wrap around:true}
答案 1 :(得分:0)
这是正则表达式的作业,可通过使用现在支持的AppleScriptObjC桥获得。将此代码粘贴到脚本编辑器中并运行它:
use AppleScript version "2.5" -- for El Capitan or later
use framework "Foundation"
use scripting additions
on stringByMatching:thePattern inString:theString replacingWith:theTemplate
set theNSString to current application's NSString's stringWithString:theString
set theOptions to (current application's NSRegularExpressionDotMatchesLineSeparators as integer) + (current application's NSRegularExpressionAnchorsMatchLines as integer)
set theExpression to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:theOptions |error|:(missing value)
set theResult to theExpression's stringByReplacingMatchesInString:theNSString options:theOptions range:{location:0, |length|:theNSString's |length|()} withTemplate:theTemplate
return theResult as text
end stringByMatching:inString:replacingWith:
set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class='foo'>SHOULDER</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class='bar'>PIG BRISKET</span> jowl ham pastrami <span class='foo'>JERKY</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>"
set modifiedHTML to its stringByMatching:"<span .*?>(.*?)</span>" inString:theHTML replacingWith:"$1"
这适用于格式良好的HTML,但正如用户foo指出的那样,浏览器可以处理格式错误的HTML,但您可能无法做到。
答案 2 :(得分:0)
我相信Ron的答案是一个很好的方法,但如果您不想使用正则表达式,可以使用下面的代码实现。看到罗恩回答之后我不会发布它,但我已经创建了它,所以我想我至少会给你第二个选择,因为你正在努力学习。
on run
set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class=\"foo\">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class=\"bar\">Pig brisket</span> jowl ham pastrami <span class=\"foo\">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>"
set theHTML to removeTag(theHTML, "<span class=\"foo\">", "</span>")
end run
on removeTag(theText, startTag, endTag)
if theText contains startTag then
set AppleScript's text item delimiters to {""}
set AppleScript's text item delimiters to startTag
set tempText to text items of (theText as string)
set AppleScript's text item delimiters to {""}
set middleText to item 2 of tempText as string
if middleText contains endTag then
set AppleScript's text item delimiters to endTag
set tempText2 to text items of (middleText as string)
set AppleScript's text item delimiters to {""}
set newString to implode(tempText2, endTag)
set item 2 of tempText to newString
end if
set newString to implode(tempText, startTag)
removeTag(newString, startTag, endTag) -- recursive
else
return theText
end if
end removeTag
on implode(parts, tag)
set newString to items 1 thru 2 of parts as string
if (count of parts) > 2 then
set newList to {newString, items 3 thru -1 of parts}
set AppleScript's text item delimiters to tag
set newString to (newList as string)
set AppleScript's text item delimiters to {""}
end if
return newString
end implode