我需要解析文件名的前10个字符,看看它们是否都是数字。显而易见的方法是fileName = ~m / ^ \ d {10} /但是我没有在applescript参考中看到任何regExy,所以,我很好奇我还有哪些其他选项来进行验证。
答案 0 :(得分:24)
不要绝望,因为OSX你也可以通过“do shell script”访问sed和grep。所以:
set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
set sedResult to do shell script thecommandstring
set isgood to sedResult starts with "*good*"
我的sed技能不是太热,所以可能有一种更优雅的方式,而不是将* good *附加到匹配[0-9] {10}的任何名称,然后在开头寻找* good *结果。但基本上,如果filename是“1234567890dfoo.mov”,这将运行命令:
echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"
注意转义的引号\“并在applescript中转义反斜杠\\。如果你要逃避shell中的东西,你必须逃避转义。所以要运行一个反斜杠的shell脚本,你必须逃脱对于像\\这样的shell来说,然后像在\\\\中那样转换每个反斜杠。这可能会很难阅读。
所以你可以在命令行上做任何事情,你可以通过从AppleScript调用它来做(woohoo!)。 stdout上的任何结果都会返回到脚本中。
答案 1 :(得分:16)
有一种更简单的方法可以使用shell(适用于bash 3.2+)进行正则表达式匹配:
set isMatch to "0" = (do shell script ¬
"[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")
注意:
[[ ... ]]
和正则表达式匹配运算符=~
; 不引用正确的操作数(或至少是特殊的正则表达式字符。)是bash 3.2+必须的,除非你先加shopt -s compat31;
do shell script
语句执行测试并通过附加命令返回其exit命令(感谢@LauriRanta); "0"
表示成功。=~
运算符不支持快捷字符类(如\d
)和断言(例如\b
)(自OS X 10.9.4起为true) - 这不太可能很快改变。)shopt -s nocasematch;
export LANG='" & user locale of (system info) & ".UTF-8';
。${BASH_REMATCH[@]}
数组变量访问捕获的字符串。\
- 转义双引号和反斜杠。以下是使用egrep
的替代方法:
set isMatch to "0" = (do shell script ¬
"egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")
虽然这可能表现更差,但它有两个好处:
\d
等快捷字符类和\b
egrep
致电-i
,您可以更轻松地制作匹配案例 - 不敏感:[[ ... =~ ... ]]
方法。最后,这里有实用程序函数打包两种方法(语法高亮显示已关闭,但它们确实有效):
# SYNOPIS
# doesMatch(text, regexString) -> Boolean
# DESCRIPTION
# Matches string s against regular expression (string) regex using bash's extended regular expression language *including*
# support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
# there is a match or not.
# - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
# a 'considering case' block.
# - The current user's locale is respected.
# EXAMPLE
# my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
on doesMatch(s, regex)
local ignoreCase, extraGrepOption
set ignoreCase to "a" is "A"
if ignoreCase then
set extraGrepOption to "i"
else
set extraGrepOption to ""
end if
# Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
# Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
end doesMatch
# SYNOPSIS
# getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
# DESCRIPTION
# Matches string s against regular expression (string) regex using bash's extended regular expression language and
# *returns the matching string and substrings matching capture groups, if any.*
#
# - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
# a 'considering case' block.
# - The current user's locale is respected.
#
# IMPORTANT:
#
# Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
# Instead, use one of the following POSIX classes (see `man re_format`):
# [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
# [[:alnum:]] [[:digit:]] [[:xdigit:]]
# [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]]
# [[:graph:]] [[:print:]]
#
# Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
#
# Always returns a *list*:
# - an empty list, if no match is found
# - otherwise, the first list element contains the matching string
# - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
# EXAMPLE
# my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
on getMatch(s, regex)
local ignoreCase, extraCommand
set ignoreCase to "a" is "A"
if ignoreCase then
set extraCommand to "shopt -s nocasematch; "
else
set extraCommand to ""
end if
# Note:
# So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
# Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
# Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
return paragraphs of result
end getMatch
答案 2 :(得分:11)
我最近需要在脚本中使用正则表达式,并希望找到一个脚本添加来处理它,因此更容易阅读正在发生的事情。我找到了Satimage.osax,它允许您使用如下语法:
find text "n(.*)" in "to be or not to be" with regexp
唯一的缺点是(截至2010年8月11日)它是一个32位的添加,因此当它从64位进程调用时会抛出错误。这位于Mail rule for Snow Leopard,因为我必须以32位模式运行Mail。但是,从一个独立的脚本调用,我没有任何保留 - 它真的很棒,让你选择你想要的任何regex syntax,并使用back-references。
2011年5月28日更新
感谢Mitchell Model在下面的评论指出他们已将其更新为64位,因此不再需要预订 - 它可以满足我的所有需求。
答案 3 :(得分:3)
我确信有一个Applescript Addition或一个shell脚本可以被调用以将正则表达式带入折叠中,但我避免了对简单内容的依赖。我一直使用这种风格模式......
set filename to "1234567890abcdefghijkl"
return isPrefixGood(filename)
on isPrefixGood(filename) --returns boolean
set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
set thePrefix to (characters 1 thru 10) of filename as text
set badPrefix to false
repeat with thisChr from 1 to (get count of characters in thePrefix)
set theChr to character thisChr of thePrefix
if theChr is not in legalCharacters then
set badPrefix to true
end if
end repeat
if badPrefix is true then
return "bad prefix"
end if
return "good prefix"
end isPrefixGood
答案 4 :(得分:3)
这是检查任何字符串的前十个字符是否为数字的另一种方法。
on checkFilename(thisName)
set {n, isOk} to {length of fileName, true}
try
repeat with i from 1 to 10
set isOk to (isOk and ((character i of thisName) is in "0123456789"))
end repeat
return isOk
on error
return false
end try
end checkFilename
答案 5 :(得分:1)
我有一个替代方案,直到我为Thompson NFA算法实现了字符类,我已经在AppleScript中完成了工作。如果有人有兴趣寻找使用Applescript解析非常基本的正则表达式,那么代码会在MacScripters的CodeExchange中发布,请看看!
这是解决文本/字符串的十个第一个字符的解决方案:
set mstr to "1234567889Abcdefg"
set isnum to prefixIsOnlyDigits for mstr
to prefixIsOnlyDigits for aText
set aProbe to text 1 thru 10 of aText
set isnum to false
if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
try
set aNumber to aProbe as number
set isnum to true
end try
end if
return isnum
end prefixIsOnlyDigits
答案 6 :(得分:0)
我可以使用以下命令直接从AppleScript(在High Sierra上)调用JavaScript。
# Returns a list of strings from _subject that match _regex
# _regex in the format of /<value>/<flags>
on match(_subject, _regex)
set _js to "(new String(`" & _subject & "`)).match(" & _regex & ")"
set _result to run script _js in "JavaScript"
if _result is null or _result is missing value then
return {}
end if
return _result
end match
match("file-name.applescript", "/^\\d+/g") #=> {}
match("1234_file.js", "/^\\d+/g") #=> {"1234"}
match("5-for-fighting.mp4", "/^\\d+/g") #=> {"5"}
似乎大多数JavaScript String methods都按预期工作。我找不到与macOS Automation的JavaScript兼容的ECMAScript版本的参考,因此请在使用前进行测试。