Question

我有一个名为2.txt的文本文件，其链接如下

www.link.php/user=1pass=3
www.link.php/user=1pass=3
www.link.php/user=1pass=3
www.link.php/user=1pass=3
www.link.php/user=1pass=3

我想制作一个curl命令，逐行访问每个链接并发布我需要的部分内容;这是访问其中一个链接时的来源：

 online - Checked user : test cpu cooling rate: 0.50<html>
<head>
</head>
<body>
    <form action="tasks.php" method="get">
        <input type="text" name="account" placeholder="username:password" style="text-    align: center" /> <br />
        <input class="btn btn-success" type="submit" value="Check Account" />
      </form>
</body>

我希望它抓住源代码并删除除<html>标记之前的所有HTML代码

所以我最终得到了像这样的文本文件

online - Checked user : test cpu cooling rate: 0.50
online - Checked user : test cpu cooling rate: 0.520
online - Checked user : test cpu cooling rate: 0.1150
online - Checked user : test cpu cooling rate: 6.50

有人可以帮我这么做吗？

Answer 1

此脚本将执行您想要的操作：

#!/bin/sh

output_file='3.txt'

while read line ; do
  curl "$line" | tr -d '\n' | sed -e :a -e 's/<[^>]*>//g;/</N;//ba' >> "$output_file"
done < '2.txt'

exit 0

感谢Blackbit正则表达式。

Answer 2

<html>之前的文字是否始终与标记位于同一行？如果是这样，你可以这样做：

#!/bin/bash

cat url_list | while read url; do
  curl "$url" | grep "<html>" | sed 's/<html>.*//'
done

将cat url_list替换为other question的首选解决方案。

从文本文件中bash卷曲数组

2 个答案: