使用Applescript和sed在文本文件中替换

时间:2012-12-08 14:45:32

标签: sed applescript

问题是plain text URL to HTML code (Automator/AppleScript)的续集。

假设我有一个普通的txt文件/Users/myname/Desktop/URLlist.txt:

title 1
http://a.b/c

title 2
http://d.e/f

...

我想(1)将所有网址(http://..。)转换为HTML代码,然后(2)添加

&nbsp;<br />

到每个空行,以便上述内容成为:

title 1
<a href="http://a.b/c">http://a.b/c</a>
&nbsp;<br />
title 2
<a href="http://d.e/f">http://d.e/f</a>
&nbsp;<br />
...

我来到以下Applescript:

set inFile to "/Users/myname/Desktop/URLlist.txt"
set middleFile to "/Users/myname/Desktop/URLlist2.txt"
set outFile to "/Users/myname/Desktop/URLlist3.txt"

do shell script "sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g' " & quoted form of inFile & " >" & quoted form of middleFile
do shell script "sed 's/^$/\\&nbsp;<br \\/>/g' " & quoted form of middleFile & " >" & quoted form of outFile

它有效,但它是多余的(而且很傻?)。有人能让它更简洁吗?可以只涉及一个文本文件而不是三个文件(即/Users/myname/Desktop/URLlist.txt中的原始内容被最终结果覆盖)?

非常感谢你。

3 个答案:

答案 0 :(得分:2)

尝试:

set inFile to "/Users/myname/Desktop/URLlist.txt"

set myData to (do shell script "sed '
/\\(http[^ ]*\\)/ a\\
&nbsp;<br />
' " & quoted form of inFile & " | sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g' ")

do shell script "echo " & quoted form of myData & " > " & quoted form of inFile

这将允许您稍后在脚本中使用myData变量。如果这不是更大的脚本的一部分而您只是修改文件,请使用-j选项作为jackjr300建议。此外,此脚本会查找原始模式并将新行添加到其中,而不是简单地查找空行。

编辑:

set inFile to "/Users/myname/Desktop/URLlist.txt"
set myData to (do shell script "sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g; s/^$/\\&nbsp;<br \\/>/g' " & quoted form of inFile)
do shell script "echo " & quoted form of myData & " > " & quoted form of inFile

答案 1 :(得分:2)

使用-i ''选项就地编辑文件。

set inFile to "/Users/myname/Desktop/URLlist.txt"

do shell script "sed -i '' 's:^$:\\&nbsp;<br />:; s:\\(http[^ ]*\\):<a href=\"\\1\">\\1</a>:g' " & quoted form of inFile

如果您需要原始文件的副本,请使用指定的sed -i ' copy'

扩展名

- 更新:

`DOCTYPE是必需的前导码。 遗留原因需要DOCTYPE。省略时,浏览器倾向于使用与某些规范不兼容的其他渲染模式。在文档中包含DOCTYPE可确保浏览器尽最大努力遵循相关规范。

HTML lang属性可用于声明网页的语言或网页的一部分。这是为了帮助搜索引擎和浏览器。根据W3C建议,您应该使用<html>标记内的lang属性声明每个网页的主要语言

<meta>标记提供有关HTML文档的元数据。 <meta>标记始终位于<head>元素内。 http-equiv属性为content属性的信息/值提供HTTP标头。 content:与http-equiv或name属性关联的值。 charset:要正确显示 HTML 页面,浏览器必须知道要使用的字符集。

在这个剧本中:我把&#34; utf-8 &#34;作为编码,通过原始文件的编码进行更改。

set inFile to "/Users/myname/Desktop/URLlist.html" -- text file with a ".html" extension
set nL to linefeed
set prepandHTML to "<!DOCTYPE html>\\" & nL & "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en-US\" lang=\"en-US\">\\" & nL & tab & "<head><meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />\\" & nL & "</head>\\" & nL

do shell script "sed -i '' 's:^$:\\&nbsp;<br />:; s:\\(http[^ ]*\\):<a href=\"\\1\">\\1</a>:g; 1s~^~" & prepandHTML & "~' " & quoted form of inFile
do shell script "echo '</html>' " & quoted form of inFile -- write last HTML tag

答案 2 :(得分:1)

我无法理解sed命令(这会让我的大脑受伤)所以这是执行此任务的AppleScript方法。希望它有所帮助。

set f to (path to desktop as text) & "URLlist.txt"

set emptyLine to "&nbsp;<br />"
set htmlLine1 to "<a href=\""
set htmlLine2 to "\">"
set htmlLine3 to "</a>"

-- read the file into a list
set fileList to paragraphs of (read file f)

-- modify the file as required into a new list
set newList to {}
repeat with i from 1 to count of fileList
    set thisItem to item i of fileList
    if thisItem is "" then
        set end of newList to emptyLine
    else if thisItem starts with "http" then
        set end of newList to htmlLine1 & thisItem & htmlLine2 & thisItem & htmlLine3
    else
        set end of newList to thisItem
    end if
end repeat

-- make the new list into a string
set text item delimiters to return
set newFile to newList as text
set text item delimiters to ""

-- write the new string back to the file overwriting its contents
set openFile to open for access file f with write permission
write newFile to openFile starting at 0 as text
close access openFile

编辑 :如果编码有问题,这两个处理程序将正确处理读/写。因此,只需将它们插入代码中并调整这些行以使用处理程序。祝你好运。

注意 :使用TextEdit打开文件时,请使用“文件”菜单并专门打开为UTF-8。

on writeTo_UTF8(targetFile, theText, appendText)
    try
        set targetFile to targetFile as text
        set openFile to open for access file targetFile with write permission
        if appendText is false then
            set eof of openFile to 0
            write «data rdatEFBBBF» to openFile starting at eof -- UTF-8 BOM
        else
            tell application "Finder" to set fileExists to exists file targetFile
            if fileExists is false then
                set eof of openFile to 0
                write «data rdatEFBBBF» to openFile starting at eof -- UTF-8 BOM
            end if
        end if
        write theText as «class utf8» to openFile starting at eof
        close access openFile
        return true
    on error theError
        try
            close access file targetFile
        end try
        return theError
    end try
end writeTo_UTF8

on readFrom_UTF8(targetFile)
    try
        set targetFile to targetFile as text
        targetFile as alias -- if file doesn't exist then you get an error
        set openFile to open for access file targetFile
        set theText to read openFile as «class utf8»
        close access openFile
        return theText
    on error
        try
            close access file targetFile
        end try
        return false
    end try
end readFrom_UTF8