Apple-或Shellscript |得到attr。 " SRC"通过它的类名并下载它的图像

时间:2018-05-04 13:10:38

标签: javascript shell web-scraping applescript

苦苦挣扎:

  1. 将包含url' s的var All_URL循环为字符串
  2. 使用Class" image_stack__image js-default-img"
  3. 从图像中获取所有src属性
  4. 将所有图像下载到文件夹,并使用源页面中的URL作为img名称。
  5. 这就是我现在拥有的所有内容,但找不到可行的解决方案(Automator中的Action除外)。

        tell application "Finder"
            set myPath to container of (path to me) as text -- SET MAIN PATH
        end tell
    
    
        set AllUrls to {"https://teespring.com/shop/CLASSIC-DODGE-CHARGER-MOP?aid=marketplace&tsmac=marketplace&tsmic=search#pid=212&cid=5819&sid=front", "https://teespring.com/shop/greaser-mechanics-t-shirt?aid=marketplace&tsmac=marketplace&tsmic=campaign#pid=2&cid=2397&sid=front"}
    
        --set ImageSrc to (script to get the src attribute from the class "image_stack__image js-default-img"
    
        --set IMGname to the Page URL where the image is
    
        set dFolder to myPath & "thumbnails"
    
        set fName to IMGname & ".jpg" as string
    
    
        do shell script ("mkdir -p " & dFolder & "; curl -A/--user-agent " & AllUrls & " >> " & (dFolder & fName))
    

    非常感谢每一位帮助。感谢

    更新:

    1. 管理从包含该类的图像中获取src / url 需要的。
    2. 管理将其下载到所需的文件夹。
    3. 需要将保存的图片名称设置为图片所在的源网址 来自。
    4. 需要在循环中完成所有这些操作,因为我会有不同的网址和 不仅仅是示例中的一个。
    5. -

      set home_path to (((path to me as text) & "::") as alias) as string
      
      tell application "Safari"
          open location "https://teespring.com/shop/CLASSIC-DODGE-CHARGER-MOP?aid=marketplace&tsmac=marketplace&tsmic=search#pid=212&cid=5819&sid=front"
          set campaign_thumbnail to do JavaScript "document.querySelector('.image_stack__image').src" in document 1
      end tell
      
      
      do shell script "curl -f " & quoted form of campaign_thumbnail & " -o " & quoted form of (POSIX path of home_path) & "thumbnails/test.jpg"
      

      更新2:

      跟进CJK的代码:

      1. " cd~ / thumbnails;" |是否保存到桌面文件夹"缩略图。我需要相对的路径 脚本文件夹,以防用户移动文件夹。没找到 卷曲解决方案,但一个"告诉应用程序查找器"这确实有效 (第1行)
      2. 下载的文件在最后一个/(560.jpg)之后有相同的结尾 我尝试使用"设置My_Name来执行shell脚本" uuidgen"和 把它添加到sh。但我宁愿将文件命名为1.jpg, 2.jpg等等。

        tell application "Finder" -- get filepath to file container/folder
            set myPath to container of (path to me) as text -- SET MAIN PATH
        end tell
        
        set allURLs to {"https://teespring.com/shop/CLASSIC-DODGE-CHARGER-MOP?aid=marketplace&tsmac=marketplace&tsmic=search#pid=212&cid=5819&sid=front", "https://teespring.com/shop/dodge-mopar-m?aid=marketplace&tsmac=marketplace&tsmic=search#pid=2&cid=2397&sid=front"}
        
        set JS to "document.querySelector('.image_stack__image').src"
        set sh to {"cd ~/desktop/thumbnails;", "curl --remote-name-all ", {}} -- need to set the location to the home folder of the script and the filename to 1.jpg , 2.jpg ..
        
        set the text item delimiters to space
        
        tell application "Safari" to repeat with www in allURLs
        set D to (make new document with properties {URL:www})
        
        # Wait until webpage has loaded
        tell D to repeat until not (exists)
            delay 0.5
        end repeat
        
        set the last item of sh to do JavaScript JS in the front document
        
        close the front document
        
        do shell script (sh as text) 
        

        结束重复

1 个答案:

答案 0 :(得分:1)

从类image_stack__image的元素中获取所有图像URL(假设此类的元素为<img>个元素,并且为了满足所有共享此类名的多个图像)这一行JavaScript将返回src属性值数组:

Array.from(document.querySelectorAll('.image_stack__image'), e=>e.src)

当您在 Safari 中使用do JavaScript命令时,AppleScript会自动将其转换为列表。

cURL所有URL放入主文件夹中的目录"thumbnails",并将每个图像与远程文件cd保存在同一目录中,然后{ {1}}使用cURL选项:

--remote-name-all

警告:可能无法下载包含异常URL的图像,例如通过CGI请求动态生成的图像,或cd ~/thumbnails; curl --remote-name-all %url1% %url2% ... 属性包含base64编码数据的图像。实际上,src请求中存在这些内容可能会破坏整个请求。

要连接从JavaScript方法返回的网址列表,以便您可以直接将其发送到curl,只需使用cURL作为分隔符将AppleScript列表强制转换为text:< / p>

space

然后在 set JS to "Array.from(document.querySelectorAll('.image_stack__image'), e=>e.src);" set sh to {"cd ~/thumbnails;", "curl --remote-name-all"} set the text item delimiters to space tell application "Safari" to tell ¬ the front document to set ¬ the end of sh to ¬ do JavaScript JS do shell script (sh as text) 循环中包含适当的代码行,为每个网页网址重复完全相同的过程:

repeat

这是它的准系统。在URL格式不正常或网页加载失败等情况下,您需要处理错误处理,但现在您已拥有完成所请求步骤的所有工具。

另外,我建议您阅读 set allURLs to {%your list of URLs%} set JS to "Array.from(document.querySelectorAll('.image_stack__image'),e=>e.src);" set sh to {"cd ~/thumbnails;", "curl --remote-name-all", {}} set the text item delimiters to space tell application "Safari" to repeat with www in allURLs set D to (make new document with properties {URL:www}) # Wait until webpage has loaded tell D to repeat until not (exists) delay 0.5 end repeat set the last item of sh to do JavaScript JS in the front document close the front document do shell script (sh as text) end repeat 的联机帮助页(在终端中输入curl),并阅读man curl选项并发现许多其他选项你可能会觉得有益。

但我会尽力帮助您解决遇到的任何轻微道路颠簸或与我所写的相关的疑问。