我在文本文件中有一个URL列表,如下所示:
文件URL.txt
https://url2.html
https://url3.html
...
https://urln.html
我希望将这些URL的内容逐行显示到文本文件中,如下所示
预期的文件Content.txt:
Content of web from url2.html
Content of web from url3.html
...
Content of web from urln.html
请帮助我找到解决问题的方法,我可以为此使用Python或Java代码。
谢谢您考虑!
答案 0 :(得分:1)
您的问题尚不清楚,但我现在假设您想从在线给定URL的文本文件中读取一行。如果这不是您想知道的,请告诉我,我会尽力为您提供进一步的帮助。无论如何,这是使用java.io.InputStreamReader
和java.net.URL#openStream()
在纯Java中实现此目的的简单方法:
/**
* Reads a text file from url and returns the first line as string.
* @param url web location of the text file to read
* @return {@code null} if an error occurred
*/
static String downloadStringLine(URL url) {
try {
java.io.InputStreamReader stream = new java.io.InputStreamReader(url.openStream());
java.io.BufferedReader reader = new java.io.BufferedReader(stream);
return reader.readLine();
}
catch (java.io.IOException e) {
System.out.printf("Unable to download string from %s", url.toString());
return null;
}
}
编辑:由于您想要一种从URL读取所有文本内容的方法,因此,该方法是通过遍历BufferedReader
的行并存储来实现的使用PrintWriter
将其保存到本地文本文件:
public class Main {
/**
* Reads and writes text based content from the given url to a file
* @param url web location of the content to store
*/
private static void storeURLContent(java.net.URL url, java.io.File file) {
try {
java.io.InputStreamReader stream = new java.io.InputStreamReader(url.openStream());
java.io.BufferedReader reader = new java.io.BufferedReader(stream);
java.io.PrintWriter writer = new java.io.PrintWriter(file);
System.out.println("Reading contents of " + url.toString());
java.util.Iterator<String> iter = reader.lines().iterator();
while (iter.hasNext()) {
writer.println(iter.next());
writer.flush();
}
System.out.println("Done, contents have been saved to " + file.getPath());
// Do not forget to close all streams
stream.close(); reader.close(); writer.close();
}
catch (java.io.IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
try {
java.net.URL url = new java.net.URL("https://www.w3.org/TR/PNG/iso_8859-1.txt");
java.io.File file = new java.io.File("contents.txt");
storeURLContent(url, file);
}
catch (java.net.MalformedURLException e) {
e.printStackTrace();
}
}
}
答案 1 :(得分:1)
您可以尝试以下python脚本。
import requests
filepath = 'url.txt'
cnt=0
f= open("content.txt","w+")
with open(filepath) as fp:
for line in fp
file_url = fp.readline()
cnt = cnt+1
f.write("Content of web from url%d.html\n ",cnt)
r = requests.get(file_url)
f.write(r.content)
答案 2 :(得分:1)
谢谢大家的帮助,我得到了朋友的答复,这正是我想要的。
很高兴收到您的支持 最好的问候。
import requests, sys, webbrowser, bs4
import codecs
def get_content(link):
page = requests.get(link)
soup = bs4.BeautifulSoup(page.content, 'html.parser')
all_p = soup.find_all('p')
content = ''
for p in all_p:
content += p.get_text().strip('\n')
return content
in_path = "link.txt"
out_path = "outputData.txt"
with open(in_path, 'r') as fin:
links = fin.read().splitlines()
with open(out_path, 'w') as fout:
for i, link in enumerate(links):
fout.write(get_content(link) + '\n')