我在尝试从Apple脚本执行shell脚本时遇到问题。我做一个“grep”,但只要它包含特殊字符,它就不会按预期工作。 (该脚本读取目录中的列表列表ob子文件夹,并检查文件中是否出现任何子文件夹。)
这是我的剧本:
set searchFile to "/tmp/output.txt"
set theCommand to "/usr/local/bin/pdftotext -enc UTF-8 some.pdf" & space & searchFile
do shell script theCommand
tell application "Finder"
set companies to get name of folders of folder ("/path/" as POSIX file)
end tell
repeat with company in companies
set theCommand to "grep -c " & quoted form of company & space & quoted form of searchFile
try
do shell script theCommand
set CompanyName to company as string
return CompanyName
on error
end try
end repeat
return false
问题是例如带有变音符号的字符串。 “命令”在某种程度上是不同的编码,当我在CLI上直接进行编码时。
$ grep -c 'Württemberg' '/tmp/output.txt' --> typed on command line
3
$ grep -c 'Württemberg' '/tmp/output.txt' --> copy & pasted from AppleScript
0
$ grep -c 'rttemberg' '/tmp/output.txt' --> no umlauts, no problems
3
第一行和第二行的“ü”不同; echo 'Württemberg' | openssl base64
显示了这一点。
我在不同的地方尝试了几种编码技巧,基本上我能找到或想到的一切。
有没有人有任何想法?如何检查字符串的编码?
提前致谢! 塞巴斯蒂安
答案 0 :(得分:0)
这可以通过在company
命令中使用之前转义每个grep
名称中具有重音的每个字符来起作用。
因此,您需要使用双反斜杠(即\\
)来逃避这些角色中的每一个(即具有重音的角色)。例如:
ü
中的Württemberg
需要成为\\ü
ö
中的Königsberg
需要成为\\ö
ß
中的Einbahnstraße
需要成为\\ß
这些重音字符,例如u with diaeresis,肯定会以不同方式编码。他们收到哪种类型的编码很难确定。我的假设是使用的编码模式以反斜杠开头 - 因此为什么使用反斜杠转义这些字符可以解决问题。考虑前一个链接中的 u with diaeresis ,它表明对于C / C ++语言,ü
被编码为\u00FC
。
在下面的完整脚本中,您会注意到以下内容:
set accentedChars to {"ü", "ö", "ß", "á", "ė"}
来保存需要转义的所有字符的列表。您需要明确说明每一个,因为似乎并不是一种推断该角色是否具有重音的方法。在将grep
命令分配给theCommand
变量之前,我们首先通过以下行读取必要的字符:
set company to escapeChars(company, accentedChars)
正如您在此处所见,我们将两个参数传递给escapeChars
子例程(即非转义company
变量和重音字符列表。)
在escapeChars
子例程中,我们遍历char
列表中的每个accentedChars
并调用findAndReplace
子例程。这将转义company
变量中带有反斜杠的那些字符的任何实例。
完整的脚本:
set searchFile to "/tmp/output.txt"
set accentedChars to {"ü", "ö", "ß", "á", "ė"}
set theCommand to "/usr/local/bin/pdftotext -enc UTF-8 some.pdf" & ¬
space & searchFile
do shell script theCommand
tell application "Finder"
set companies to get name of folders of folder ("/path/" as POSIX file)
end tell
repeat with company in companies
set company to escapeChars(company, accentedChars)
set theCommand to "grep -c " & quoted form of company & ¬
space & quoted form of searchFile
try
do shell script theCommand
set CompanyName to company as string
return CompanyName
on error
end try
end repeat
return false
(**
* Checks each character of a given word. If any characters of the word
* match a character in the given list of characters they will be escapd.
*
* @param {text} searchWord - The word to check the characters of.
* @param {text} charactersList - List of characters to be escaped.
* @returns {text} The new text with the item(s) replaced.
*)
on escapeChars(searchWord, charactersList)
repeat with char in charactersList
set searchWord to findAndReplace(char, ("\\" & char), searchWord)
end repeat
return searchWord
end escapeChars
(**
* Replaces all occurances of findString with replaceString
*
* @param {text} findString - The text string to find.
* @param {text} replaceString - The replacement text string.
* @param {text} searchInString - Text string to search.
* @returns {text} The new text with the item(s) replaced.
*)
on findAndReplace(findString, replaceString, searchInString)
set oldTIDs to text item delimiters of AppleScript
set text item delimiters of AppleScript to findString
set searchInString to text items of searchInString
set text item delimiters of AppleScript to replaceString
set searchInString to "" & searchInString
set text item delimiters of AppleScript to oldTIDs
return searchInString
end findAndReplace
目前,您的grep模式仅报告找到该单词的行数。不是发现了多少个单词的实例。
如果您想要实际的单词实例数,请使用-o
选项和grep
输出每个匹配项。然后使用-l
选项将其传递到wc
以计算行数。例如:
grep -o 'Württemberg' /tmp/output.txt | wc -l
并在您的AppleScript中:
set theCommand to "grep -o " & quoted form of company & space & ¬
quoted form of searchFile & "| wc -l"
提示:如果您要删除记录的计数/数字中的前导空格,请将其传递给sed
以删除空格:例如,通过您的脚本:
set theCommand to "grep -o " & quoted form of company & space & ¬
quoted form of searchFile & "| wc -l | sed -e 's/ //g'"
和等效的命令行:
grep -o 'Württemberg' /tmp/output.txt | wc -l | sed -e 's/ //g'