我有大约1000个链接到远程PDF文件的网址,我需要确定哪些网页可供Safari搜索,哪些不是。我有我的脚本循环并在Safari中打开URL但我仍然坚持下面的最后两个步骤。
有人可以帮忙吗?感谢
脚本需要:
对于每个网址:
告诉Safari
在PDF中搜索字符“a”使用右键单击弹出的查找,而不是Apple F
将搜索结果写入文件
set urlList to {"http://pricelist.list.com/pricelists/A/AEA_11-15-12.pdf", "http://pricelist.list.com/pricelists/A/API_1608_04-05-13.pdf", "http://pricelist.list.com/pricelists/A/Access_02-01-12.pdf", "http://pricelist.list.com/pricelists/A/Allparts_Retail_01-01-11.pdf"}
set numURLs to (count urlList)
repeat with i from 1 to (numURLs)
set theURL to (item i of urlList)
tell application "Safari"
open location theURL
activate
--Perform search
--Write results to file
end tell
tell application "System Events"
tell process "Safari"
click menu item "Close Other Tabs" of menu "File" of menu bar 1
end tell
end tell
delay 5
结束重复
答案 0 :(得分:0)
下载PDF并使用shell脚本可能更容易:
brew install poppler wget parallel
cat ~/Documents/urls.txt | parallel -P8 wget
for f in *.pdf; do [[ $(pdffonts -- "$f" 2> /dev/null | wc -l) -eq 2 ]] && printf %s\\n "$f"; done
pdffonts为没有嵌入字体的扫描PDF打印两行输出。请参阅How do I determine programmatically if a PDF is searchable?。