我正在存储在JSoup帮助下检索的数据,并将它们提交给我自己的HTTP API。
问题:如何使用多线程迭代我的HashMap
,而不是每个线程都像我们当前的情况一样处理HashMap
的相同值。
实际上:
Thread1: a
Thread2: a
Thread3: a
Thread4: a
Thread1: b
Thread2: b
Thread3: b
Thread4: b
我想要这样的事情:
Thread1 : a
Thread2 : b
Thread3 : c
Thread4 : d
package ygg.org;
import java.io.IOException;
import java.net.URLEncoder;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Filmstreaming1 {
final static int NB_PAGE = 2;
final static int THREADS = 1;
static ConcurrentHashMap<String, String> movies_list = new ConcurrentHashMap<>();
static int count = 0;
static void Initialize() {
System.out.println("----------------------------------");
System.out.println("Homer is starting...");
System.out.println("------------------------------");
for (int i = 1 ; i <= NB_PAGE ; i++) {
try {
Document page = Jsoup.connect("http://xxxxxxx.com/page/" + i + "/")
.userAgent("Mozilla")
.timeout(3000)
.post();
Elements movies = page.getElementsByClass("margin-b40").get(0).getElementsByClass("short-link").select("a");
for (Element movie : movies) {
String href = movie.attr("href");
String movie_title = movie.text().replaceAll("\\(.*\\)", "");
boolean isMovieExists = movies_list.containsKey(href);
if (isMovieExists == false) {
movies_list.put(href, movie_title);
System.out.println("Ajout du film " + movie_title);
}
}
System.out.println("Total récupérés " + movies_list.size() + " page : " + i);
} catch(IOException ioe) {
System.out.println("Exception: " + ioe);
}
}
try {
for (int i = 0; i <= THREADS; i++) {
Thread api = new ThreadApi();
api.start();
}
} catch(Exception e) {
System.out.println("Exception: " + e.getMessage());
}
}
}
class ThreadApi extends Thread {
public void run() {
while(true) {
Filmstreaming1.movies_list.forEach((key, value) -> {
try {
String code = key.substring(key.indexOf("com/") + 4, key.indexOf("-"));
Document page = Jsoup.connect("http://xxxxxxx.com/" + code + "--.html")
.userAgent("Mozilla")
.timeout(3000)
.post();
String director = page.getElementsByClass("finfo-text").get(5).text().toString();
Document page1 = Jsoup.connect("http://xxxxxxx.com/play.php?newsid=" + code + "&vt=ol&sr=3")
.referrer("http://xxxxxxx.com/" + code + "--.html")
.userAgent("Mozilla")
.timeout(3000)
.post();
String link = page1.getElementsByTag("iframe").first().attr("src").toString();
String encoded_title = URLEncoder.encode((String) value, "UTF-8");
String encoded_director = URLEncoder.encode((String) director, "UTF-8");
String url = "http://xxxxxxx.com/api/movie?movie=" + encoded_title + "&director=" + encoded_director;
// On affiche l'url
System.out.println(url);
Document api = Jsoup.connect(url)
.userAgent("Mozilla")
.timeout(3000)
.get();
String response = api.text();
System.out.println(response);
if (response == "-1") {
System.out.println("Erreur");
} else {
url = "http://xxxxxxx.com/api/video?link=" + link + "&ref=" + response + "&version=vf";
Document submit = Jsoup.connect(url)
.userAgent("Mozilla")
.timeout(3000)
.get();
response = submit.text();
Filmstreaming1.movies_list.remove(key);
System.out.println(response);
}
} catch(Exception e) {
System.out.println("Exception " + e.getMessage());
}
});
}
}
}
答案 0 :(得分:0)
由于您的Map
已经是ConcurrentHashMap
,您可以使用ConcurrentHashMap.forEach - 这允许配置paralleslismThreshold
,如果阈值可以自动执行并行调用超过了。
文档有关于阈值参数的影响的以下内容:
这些批量操作接受parallelismThreshold参数。如果估计当前地图大小小于给定阈值,则方法顺序进行。使用Long.MAX_VALUE值可以抑制所有并行性。使用值1可通过划分为足够的子任务来充分利用用于所有并行计算的ForkJoinPool.commonPool()来实现最大并行度。通常,您最初会选择其中一个极值,然后测量使用中间值的性能,这些值会影响开销与吞吐量之间的差异。
因此,无需创建自己的Thread
甚至Runnable
- 实现,任何方法引用或充当BiConsumer<? super K,? super V>
的lambda都可以。