Question

我对编码或执行类似操作非常陌生。我在excel中有数千个URL的列表。每个URL与大约300个数字之一相关联。我有它，因为一列是URL，下一列是与URL关联的数字。例如，我有五个与数字1关联的URL，四个与数字2关联的URL，等等。我试图下载在URL上找到的文件，但通过关联的数字维护我拥有的组织。因此，我试图将与1关联的URL中的所有文件放入一个文件夹中，将与2关联的URL中的所有文件形成一个单独的文件夹中，等等。

我相信使用bash脚本和wget是实现这一目标的途径，但是我正在努力找出正确的命令系列。我希望人们能给予我任何帮助。

我不希望有人为我做这件事，但是我会感谢任何有益的提示，有用的资源或指南，人们可以向我指出。谢谢！

我相信将我的Excel工作表另存为CSV是正确的前进路线的一部分，但是我对自己的工作一无所知。

Answer 1

通常，人们应该发布他们到目前为止已经尝试过的内容。但是，由于您在这里是全新的，所以让我们看看是否至少可以让您起步。

#!/bin/bash

# Example input file urls.csv
# http://foo.com,2
# http://bar.com,7
# Reference for the "wget" command I used - https://www.guyrutenberg.com/2014/05/02/make-offline-mirror-of-a-site-using-wget/

#
# Split the file on the comma and loop through the url / ID pairs
#
awk -F, '{print $1" "$2}' urls.csv | while read url id
do
   echo "Getting url $url ID $id"
   #
   # Make the directory if it doesn't exist, and change directory into it
   #
   if [ ! -d $id ]; then
      mkdir $id
   fi
   cd $id
   #
   # Execute the wget
   #
   wget --mirror --convert-links --adjust-extension --page-requisites --no-parent $url
   #
   # Change directory back up to the parent
   #
   cd ..
done

如何使用wget下载URL列表并根据第二个数据字段对它们进行排序？

1 个答案: