
时间:2016-09-05 02:51:21

标签: macos shell applescript extract automator


目标:从下面的文字中提取发布作业的人的姓名,而不知道该人的姓名。我知道字符串“Job posted by”会立即预先找到我正在寻找的名字,我知道“·”会立即跟随这个名字。否则文本文档中的其他位置会出现这些环绕声字符串中的任何一个。

I'm running OS X El Capitan
file name for this example is ExtractedTextOutput.txt
file location for this example is "/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt"

set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
set theFileContents to read theFile

set output to {}
set od to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"

set all_lines to every text item of theFileContents
repeat with the_line in all_lines
if "Job posted by" is not in the_line then
    set output to output & the_line
    set AppleScript's text item delimiters to {"Job posted by"}
    set latter_part to last text item of the_line
    set AppleScript's text item delimiters to {" "}
    set last_word to last text item of latter_part
    set output to output & ("$ " & last_word as string)
end if
end repeat

set AppleScript's text item delimiters to {"

set output to output as string
set AppleScript's text item delimiters to od
return output


文件中的示例文本:  9/2/2016应用安全工程师在大纽约市Datadog工作| LinkedIn     60  主页简介 职位描述 我的网络工作 搜索人员,工作,公司等......兴趣 先进  商业服务  去Lynda.c   应用安全工程师 Datadog 大纽约地区     发表于15天前93次 1明矾在这里工作    在公司网站上申请   我们的使命是为云计算运营带来理智,我们需要您在我们的平台上构建弹性和安全的应用程序。你会做什么 执行代码和设计评审,贡献代码以提高整个Datadog产品的安全性让您的同事们了解代码和基础架构的安全性 监视异常活动的生产应用程序 优先考虑并跟踪整个公司的应用程序安全问题     帮助改进我们的安全策略和流程 职位发布者 莱恩埃尔伯格·第二名 Datadog大纽约市区技术人才招聘负责人 发送Inmail

2 个答案:

答案 0 :(得分:2)

我很难确定你的第二个分隔符是什么。你的文字示例显示'·',但当我检查'Elberg'之后和'2nd'之前的内容时,我发现了4个字符:代码32(空格),代码194(¬),代码183(Σ) ,代码32(空格)。


set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
-- your separator seems to be code 32 (space), code 194 (¬), code 183 (∑), code 32 (space)
set Separator to ASCII character 194 -- is it correct ?

set theFileContents to read theFile
set myAuthor to ""
set AppleScript's text item delimiters to {"Job posted by "}
if (count of text item of theFileContents) is 2 then
set Part2 to text item 2 of theFileContents -- this part starts just after "Job posted by "
set AppleScript's text item delimiters to {Separator}
set myAuthor to text item 1 of Part2
end if

log "result=//" & myAuthor & "//" -- show the result in variable myAuthor


答案 1 :(得分:0)

您有正确的想法使用AppleScript's text item delimiters,但您尝试提取名称的方式给您带来了麻烦。不过,首先,我将介绍一些可以改进脚本的方法:

set all_lines to every text item of theFileContents
repeat with the_line in all_lines
    if "Job posted by" is not in the_line then
    set output to output & the_line
end repeat



set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
set theFileContents to read theFile

set output to {}
set od to AppleScript's text item delimiters

if "Job posted by" is in theFileContents
    set AppleScript's text item delimiters to {"Job posted by"}
    set latter_part to last text item of theFileContents
    set AppleScript's text item delimiters to {" "}
    set last_word to last text item of latter_part
    set output to output & ("$ " & last_word as string)
    display alert "Poster of job listing not found"
    set output to theFileContents
end if

set AppleScript's text item delimiters to od
return output


set last_word to last text item of latter_part
set output to output & ("$ " & last_word as string)

这是不正确的。这不是你想要的 last 这个词;那是文件的最后一个字!要提取作业列表的海报,请将其更改为以下内容:

repeat with theWord in latterPart
    if the first character in theWord is "¬" then exit repeat
    set output to output & theWord
end repeat




我认为你出于某种原因想要在输出中输入美元符号,所以我保留了它。如果你不想要它,只需用set output to "$ "替换set output to ""


set theFile to "/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt"
set theFileContents to read theFile as text

set output to "$ "
set od to AppleScript's text item delimiters

if "Job posted by" is in theFileContents then
    set AppleScript's text item delimiters to {"Job posted by"}
    set latterPart to last text item of theFileContents
    set AppleScript's text item delimiters to {" "}
    repeat with theWord in latterPart
        if the first character in theWord is "¬" then exit repeat
        set output to output & theWord
    end repeat
    display alert "Poster of job listing not found"
    set output to theFileContents
end if

set AppleScript's text item delimiters to od
return output