Question

我对bash脚本非常有信心，但这似乎有点过头了。

我正在尝试做的是取一个字符串 - IE

page_content=<div class="contact_info_wrap"><img src="http://example.com/UserMedia/gafgallery/icons/email_icon.png" style="border-width: 0px; border-style: solid;" width="40" /><img alt="" src="example.com/UserMedia/gafgallery/icons/loc_icon.png" style="border-width: 0px; border-style: solid;" width="40" />

通过使用它找到：

 pageCheck="example.com"
 if test "${page_content#*$pageCheck}" != "$page_content"

then我试图获取$page_content中的每个网址，仅包含http://example.com，并将其添加到数组中。虽然我老实说甚至不知道从哪里开始！我想最终得到类似的东西：

This[0]='http://example.com/the/first/url/containing/example.com'
This[1]='http://example.com/the/second/url/containing/example.com'
This[2]='etc ... '
This[3]='etc ... '

有没有一种简单有效的方法来完成这项工作？

Answer 1

尝试这样的事情：

#!/bin/bash
sql_request()
{
mysql --login-path=myhostalias -Dywpadmin_current_content -e"SELECT page_id, page_content FROM client_content WHERE client_section_id = '$client_section_id'"
}

filter_urls()
{
grep -E -o "(href|src)=\"[^\"]*$1[^\"]*" | cut -d'"' -f2 | sort -u
}

declare -a array=()
while read page_id page_content
do
  while read url
  do
     array+=("$url")
  done < <(filter_urls "example.com" <<<"$page_content")
done < <(sql_request)

printf "%s\n" "${array[@]-}" # Just to show array content

我不是mysql的专家，我只是复制/粘贴你的命令，假设它正在工作。我假设您想要一个包含所有页面URL的数组，但如果您正在寻找其他内容，可以轻松调整解决方案。

此外，我假设read正确读取了您的数据，但未更改IFS或使用常见-r选项，但您可能需要这样做。

有些兴趣点：

请注意使用进程替换< <()，它允许读取内部命令，有点像管道。最大的区别是它将循环体留在主shell上下文中，因此允许在退出循环后分配变量而不会丢失它们的值。
我允许以src或href开头的网址，但我认为它们总是被引用。如果不保存此假设，则需要重新使用正则表达式。
该脚本使用-u对网址进行排序，使其在每页基础上是唯一的，这有点懒惰（如果您需要使它们唯一，它们可能需要在数组中是唯一的）。不知道你真正需要什么，我不想添加代码而不确定它是否有帮助而不是阻碍。

bash - 将特定URL从字符串中隔离到一个数组中

1 个答案: