Question

我在尝试从Apple脚本执行shell脚本时遇到问题。我做一个“grep”，但只要它包含特殊字符，它就不会按预期工作。（该脚本读取目录中的列表列表ob子文件夹，并检查文件中是否出现任何子文件夹。）

这是我的剧本：

set searchFile to "/tmp/output.txt"

set theCommand to "/usr/local/bin/pdftotext -enc UTF-8 some.pdf" & space & searchFile
do shell script theCommand

tell application "Finder"
    set companies to get name of folders of folder ("/path/" as POSIX file)
end tell

repeat with company in companies
    set theCommand to "grep -c " & quoted form of company & space & quoted form of searchFile

    try
        do shell script theCommand
        set CompanyName to company as string
        return CompanyName
    on error

    end try
end repeat

return false

问题是例如带有变音符号的字符串。 “命令”在某种程度上是不同的编码，当我在CLI上直接进行编码时。

$ grep -c 'Württemberg' '/tmp/output.txt' --> typed on command line
3
$ grep -c 'Württemberg' '/tmp/output.txt' --> copy & pasted from AppleScript
0
$ grep -c 'rttemberg' '/tmp/output.txt'   --> no umlauts, no problems
3

第一行和第二行的“ü”不同; echo 'Württemberg' | openssl base64显示了这一点。

我在不同的地方尝试了几种编码技巧，基本上我能找到或想到的一切。

有没有人有任何想法？如何检查字符串的编码？

提前致谢！塞巴斯蒂安

Answer 1

概述

这可以通过在company命令中使用之前转义每个grep名称中具有重音的每个字符来起作用。

因此，您需要使用双反斜杠（即\\）来逃避这些角色中的每一个（即具有重音的角色）。例如：

ü中的Württemberg需要成为\\ü
ö中的Königsberg需要成为\\ö
ß中的Einbahnstraße需要成为\\ß

为什么这是必要的：

这些重音字符，例如u with diaeresis，肯定会以不同方式编码。他们收到哪种类型的编码很难确定。我的假设是使用的编码模式以反斜杠开头 - 因此为什么使用反斜杠转义这些字符可以解决问题。考虑前一个链接中的 u with diaeresis ，它表明对于C / C ++语言，ü被编码为\u00FC。

解决方案

在下面的完整脚本中，您会注意到以下内容：

set accentedChars to {"ü", "ö", "ß", "á", "ė"}来保存需要转义的所有字符的列表。您需要明确说明每一个，因为似乎并不是一种推断该角色是否具有重音的方法。
在将grep命令分配给theCommand变量之前，我们首先通过以下行读取必要的字符：
```
set company to escapeChars(company, accentedChars)
```
正如您在此处所见，我们将两个参数传递给escapeChars子例程（即非转义company变量和重音字符列表。）
在escapeChars子例程中，我们遍历char列表中的每个accentedChars并调用findAndReplace子例程。这将转义company变量中带有反斜杠的那些字符的任何实例。

完整的脚本：

set searchFile to "/tmp/output.txt"
set accentedChars to {"ü", "ö", "ß", "á", "ė"}

set theCommand to "/usr/local/bin/pdftotext -enc UTF-8 some.pdf" & ¬
  space & searchFile
do shell script theCommand

tell application "Finder"
  set companies to get name of folders of folder ("/path/" as POSIX file)
end tell

repeat with company in companies
  set company to escapeChars(company, accentedChars)

  set theCommand to "grep -c " & quoted form of company & ¬
    space & quoted form of searchFile

  try
    do shell script theCommand
    set CompanyName to company as string
    return CompanyName
  on error

  end try
end repeat

return false

(**
 * Checks each character of a given word. If any characters of the word
 * match a character in the given list of characters they will be escapd.
 *
 * @param {text} searchWord - The word to check the characters of.
 * @param {text} charactersList - List of characters to be escaped.
 * @returns {text} The new text with the item(s) replaced.
 *)
on escapeChars(searchWord, charactersList)
  repeat with char in charactersList
    set searchWord to findAndReplace(char, ("\\" & char), searchWord)
  end repeat
  return searchWord
end escapeChars

(**
 * Replaces all occurances of findString with replaceString
 *
 * @param {text} findString - The text string to find.
 * @param {text} replaceString - The replacement text string.
 * @param {text} searchInString - Text string to search.
 * @returns {text} The new text with the item(s) replaced.
 *)
on findAndReplace(findString, replaceString, searchInString)
  set oldTIDs to text item delimiters of AppleScript
  set text item delimiters of AppleScript to findString
  set searchInString to text items of searchInString
  set text item delimiters of AppleScript to replaceString
  set searchInString to "" & searchInString
  set text item delimiters of AppleScript to oldTIDs
  return searchInString
end findAndReplace

关于当前计数的注意事项：

目前，您的grep模式仅报告找到该单词的行数。不是发现了多少个单词的实例。

如果您想要实际的单词实例数，请使用-o选项和grep输出每个匹配项。然后使用-l选项将其传递到wc以计算行数。例如：

grep -o 'Württemberg' /tmp/output.txt | wc -l

并在您的AppleScript中：

set theCommand to "grep -o " & quoted form of company & space & ¬
  quoted form of searchFile & "| wc -l"

提示：如果您要删除记录的计数/数字中的前导空格，请将其传递给sed以删除空格：例如，通过您的脚本：

set theCommand to "grep -o " & quoted form of company & space & ¬
  quoted form of searchFile & "| wc -l | sed -e 's/ //g'"

和等效的命令行：

grep -o 'Württemberg' /tmp/output.txt | wc -l | sed -e 's/ //g'

如何通过AppleScript

1 个答案:

概述

为什么这是必要的：

解决方案

关于当前计数的注意事项：