Question

所以我构建了一个将信息输入网站的Applescript。我现在要弄清楚的是从页面内容中提取“重定向的URL”以存储在python shell字符串[Automator OSX]中的方法。

基本上，如果我知道URL，我知道如何扫描html以在Python中查找文本正文。在这些情况下，我不知道URL，但URL在网页上

我想到了两种不同的方法：

1）有没有办法从Applescript中打开的浏览器文档中提取文本信息？如果它是Python，那么我会使用正则表达式来搜索我需要的内容，但我不知道如何在Applescript中执行此操作。

如果没有，那么

2）有没有办法通过Python获取打开的浏览器文档的URL？如果是这样，那么我将能够使用urllib来获取我需要的信息。

我想要提取以下网址：

“计算完成后，您可以在此处访问结果：”

***请注意，浏览器中的URL与此URL相同，但仅在处理完数据后才会显示。每次分析的时间都不同，这就是为什么我不想直接从工具栏区域获取URL。但是，此链接会立即弹出

enter image description here

网页的地址是：

http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi?P

更新了部分问题

3）如果使用Safari.app有一种方法可以使用Applescript单击“继续”提交按钮

Answer 1

使用safari。

如果在计算链接时链接始终是相同的索引。

即链接号4。

你可以尝试：

tell application "Safari"
    set thelink to do JavaScript "document.links[4].href " in document 1
end tell

将返回链接网址。

---------- UPDATE

第二种方法是返回包含“RNAfold /”

的链接

tell application "Safari" to set thelinkCount to do JavaScript "document.links.length " in document 1
set theUrl to ""
repeat with i from 1 to thelinkCount
    tell application "Safari" to set this_link to (do JavaScript "document.links[" & i & "].href" in document 1) as string
    if this_link contains "RNAfold/" then
        set theUrl to this_link
        exit repeat
    end if
end repeat

log theUrl

更新2

这直接转到链接的innerHTML而不进行迭代，并返回url字符串

tell application "Safari"
    tell document 1 to set theUrl to (do JavaScript "document.getElementsByTagName('BODY')[0].getElementsByTagName('b')[0].getElementsByTagName('a').item(0).innerHTML; ")
 end tell

更新3

在提出新问题后添加。

点击“继续”提交按钮。你得到它的类名并使用更多的javascript来点击ii

do JavaScript "document.getElementsByClassName('proceed')[0].click()" in document 1

完整示例

set theUrl to ""

tell application "Safari"

    tell document 1

        do JavaScript "document.getElementsByClassName('proceed')[0].click()"
        delay 1
        set timeoutCounter to 0
        repeat until (do JavaScript "document.readyState") is "complete"
            set timeoutCounter to timeoutCounter + 1

            delay 0.5
            if timeoutCounter is greater than 50 then
                exit repeat
            end if
        end repeat
        set theUrl to (do JavaScript "document.getElementsByTagName('BODY')[0].getElementsByTagName('b')[0].getElementsByTagName('a').item(0).innerHTML; ")

    end tell
end tell
log theUrl

Answer 2

这里没有任何纠错，但您可以尝试使用Safari，例如：

tell application "Safari" to set s to source of document 1

set o1 to offset of "results here: <a href" in s
set o2 to offset of "</a></b><br><br>" in s

text (o1 + 23) thru (o2 - 1) of s

我看到了网址，去了网站，使用了一个样本RNA序列，使用了cgi，到了页面并运行了这个脚本，然后提取了网址。但是（我确定你知道），该页面会在几秒钟内自动指向另一个页面。

[edit：]或者，从页面顶部获取刷新元标记：

tell application "Safari" to set s to source of document 1

set topRefreshMetaTagPar to paragraph 6 of s

text 45 thru -3 of topRefreshMetaTagPar

使用Python或Applescript从网页中提取文本

http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi?P

2 个答案: