如何从AppleScript中的特定起始位置进行搜索

时间:2014-01-29 22:51:22

标签: applescript

我正在尝试在很长的文本中搜索字符串。通常我会沿着这些方向做点什么:

set testString to "These aren't the droids you're looking for. Now we have a ridiculously large amount of text. These ARE the DROIDS you're looking for."

set searchTerm to "droids"
set searchTermLength to count of characters in searchTerm

# Gets string from first appearance of searchTerm
set testStringSearch to characters 19 thru -1 of testString as text

# Finds location of next appearance of searchTerm
set testLocation to offset of searchTerm in testStringSearch

# Returns next location of searchTerm
set theTest to characters testLocation thru (testLocation + searchTermLength) of testStringSearch as text
return theTest

但是,文本量太大(120k +字符),当我尝试设置testStringSearch时,它会挂起一段时间。

由于我要创建一个返回searchTerm每个位置的循环,我想尽可能避免浪费时间。有什么我想念的吗?

1 个答案:

答案 0 :(得分:2)

你最大的瓶颈就是剥掉字符串的开头:

set testStringSearch to characters 19 thru -1 of testString as text

假设平均字长为5个字符,则会创建一个包含近600,000个字符的列表,然后将该列表重新转换为文本。

您最好的选择是将字符串转换为可以预先使用的数据,并将该数据用于脚本的其余部分。例如,您可以在目标搜索词上拆分字符串,并使用剩余的字符串长度来创建偏移列表:

set offsets to allOffsets("A sample string", "sample")
--> {3}

on allOffsets(str, target)
    set splitString to my explode(str, target)
    set offsets to {}
    set compensation to 0
    set targetLength to length of target
    repeat with i from 1 to ((count splitString) - 1)
        set currentStringLength to ((length of item i of splitString))
        set end of offsets to currentStringLength + compensation + 1
        set compensation to compensation + currentStringLength + targetLength
    end repeat
    return offsets
end allOffsets


on explode(theText, theDelim)
    set AppleScript's text item delimiters to theDelim
    set theList to text items of theText
    set AppleScript's text item delimiters to ""
    return theList
end explode

正如您所看到的,要获取当前偏移量,您将获取字符串+ 1的长度,然后在compensation变量中,您将跟踪所有先前字符串的长度已经处理完毕。

效果

我确实发现性能与字符串中发现的次数直接相关。我的测试数据由Lorem Ipsum generator中的20,000个单词组成。

运行1:

Target: "lor"
Found:  141 Occurrences
Time:   0.01 seconds

运行2:

Target: "e"
Found:  6,271 Occurrences
Time:   1.97 seconds

运行3:

Target: "xor"
Found:  0 Occurrences
Time:   0.00 seconds