Question

我有一个用MacRoman编码的文本列表，由换行符断开。不知怎的，第二个列表无法保存在MacRoman中，因此我不得不使用Unicode UTF-16来获取德语“ö”，“ä”等内容。当ListA像预期的那样被填充时，listB不会再被破坏了，我最终得到了一个字符串，我不能再破坏/不知道怎么做了。有人可以帮我吗？

set ListA to (read file myFile1 using delimiter linefeed) as list    
display dialog "" & item 1 of ListA    
--> "Name A" 

set ListB to (read file myFile2 using delimiter linefeed as Unicode text) as list    
display dialog "" & item 1 of ListB    
--> "Name A    
Name B    
Name C    
Name D"

Answer 1

可以有许多不同类型的字符分隔文本文件中的行。它并不总是换行。处理它们的最简单方法是使用applescript命令“paragraph”而不是在读取文件时使用分隔符。段落非常适合弄清楚使用什么字符并处理它。它并不总是有效，但在深入研究问题之前值得一试。因此，请尝试阅读这样的文件......

set ListB to paragraphs of (read file myFile2 as Unicode text)

如果这不起作用，那么你将不得不试着找出角色是什么。我在这些情况下做的是物理打开文件并用鼠标选择返回字符...并复制它。然后我返回AppleScript编辑器并将其粘贴到此命令中。将它粘贴到我有字母“a”的地方。它会给你角色ID。

id of "a"

然后你可以使用这样的分隔符读取文件，显然使用上面命令中的id号代替97 ...

set ListB to read file myFile2 using delimiter (character id 97) as Unicode text

Answer 2

您确定该文件使用LF行结尾吗？这对我有用：

set f to POSIX file "/tmp/1"
set b to open for access f with write permission
set eof b to 0
write "あ" & linefeed & "い" to b as Unicode text -- UTF-16
close access b
read f using delimiter linefeed as Unicode text

您是否尝试将文件另存为UTF-8？您可以将Unicode text替换为«class utf8»来阅读。

通过Applescript中的分隔符打破UTF-16 Unicode文本？

2 个答案: