Applescript:清理一根绳子

时间:2010-05-06 19:05:54

标签: applescript

我有这个字符串有非法字符我想删除但我不知道可能存在哪种字符。

我构建了一个我不希望被过滤的字符列表,我构建了这个脚本(来自我在网上找到的另一个)。

on clean_string(TheString)
    --Store the current TIDs. To be polite to other scripts.
    set previousDelimiter to AppleScript's text item delimiters
    set potentialName to TheString
    set legalName to {}
    set legalCharacters to {"a", "b", "c", "d", "e", "f", 
"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E",
 "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",
 "S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5",
 "6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é",
 "É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ",
 "õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%",
 "/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".",
 "–", "_", " ", ":", ";", ASCII character 10, ASCII character 13}

    --Whatever you want to eliminate.
    --Now iterate through the characters checking them.
    repeat with thisCharacter in the characters of potentialName
        set thisCharacter to thisCharacter as text
        if thisCharacter is in legalCharacters then
            set the end of legalName to thisCharacter
            log (legalName as string)

        end if
    end repeat
    --Make sure that you set the TIDs before making the
    --list of characters into a string.
    set AppleScript's text item delimiters to ""
    --Check the name's length.
    if length of legalName is greater than 32 then
        set legalName to items 1 thru 32 of legalName as text
    else
        set legalName to legalName as text
    end if
    --Restore the current TIDs. To be polite to other scripts.
    set AppleScript's text item delimiters to previousDelimiter
    return legalName
end clean_string

问题是这个脚本很慢,让我超时。

我正在做的是逐字符检查并与legalCharacters列表进行比较。如果角色在那里,那很好。如果没有,请忽略。

有快速的方法吗?

类似

“查看TheString的每个字符并删除那些不在legalCharacters上的字符”

感谢您的帮助。

4 个答案:

答案 0 :(得分:3)

您遇到了哪些非ascii字符?你的文件编码是什么?

使用shell脚本和tr,sed或perl来处理文本要高效得多。默认情况下,所有语言都安装在OS X中。

您可以使用带有tr的shell脚本(如下例所示)来删除返回,也可以使用sed去除空格(不在下面的示例中):

set clean_text to do shell script "echo " & quoted form of the_string & "| tr -d '\\r\\n' "

Technical Note TN2065: do shell script in AppleScript

或者,使用perl,这将删除非打印字符:

set x to quoted form of "Sample text. smdm#$%%&"
set y to do shell script "echo " & x & " | perl -pe 's/[^[:alnum:]|[:space:]]//g'"

在SO周围搜索使用tr,sed和perl处理Applescript文本的其他示例。或者搜索MacScripter / AppleScript | Forums

答案 1 :(得分:2)

在Applescript中迭代总是很慢,并且实际上没有更快的解决这些问题的方法。登录循环是减慢速度的绝对保证方式。明智地使用log命令。

但是,在您的特定情况下,您有一个长度限制,将长度检查移动到重复循环中可能会大大缩短处理时间(在脚本调试器中运行不到一秒钟,无论文本长度如何) :

    on clean_string(TheString)
     set potentialName to TheString
     set legalName to {}
     set legalCharacters to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é", "É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ", "õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%", "/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".", "–", "_", " ", ":", ";", ASCII character 10, ASCII character 13}
 with timeout of 86400 seconds --86400 seconds = 24 hours

     repeat with thisCharacter in the characters of potentialName
      set thisCharacter to thisCharacter as text
      if thisCharacter is in legalCharacters then
       set the end of legalName to thisCharacter
       if length of legalName is greater than 32 then
        return legalName as text
       end if
      end if
     end repeat
 end timeout
     return legalName as text
    end clean_string

答案 2 :(得分:2)

另一个Shell脚本方法可能是:

set clean_text to do shell script "echo " & quoted form of the_string & "|sed \"s/[^[:alnum:][:space:]]//g\""

使用sed删除不是字母数字字符或空格的所有内容。更多正则表达式引用here

答案 3 :(得分:0)

BBEdit或TextWrangler会更快,更快。下载TextWrangler(它是免费的),然后打开你的文件并运行Text - > Zap Gremlins ......上面。这样做你需要的吗?如果确实如此,请用冷饮庆祝。如果没有,请尝试BBEdit(它不是免费的)并根据需要创建一个具有尽可能多的“全部替换”条件的新文本工厂,然后打开文件并在其上运行文本工厂。