它的外观如下: 线程1:
从线程2写入的文件相同文件中读取数据的方法 在没有任何其他线程运行的情况下执行,直到完成 READFILE();
package parser;
import java.util.ArrayList;
import java.util.List;
import java.io.StringReader;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import utils.ReadMachine;
import utils.TextProcessor;
import utils.WriteMachine;
import edu.stanford.nlp.process.Tokenizer;
import edu.stanford.nlp.process.TokenizerFactory;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
public class Parser implements Runnable {
private String parserModel;
private LexicalizedParser lp;
private ReadMachine rwm;
private WriteMachine wm;
private TextProcessor tp;
private String from;
private int nbr;
private static final Logger logger = LoggerFactory.getLogger(Parser.class);
public Parser(String from, String to, int nbr) {
rwm = new ReadMachine();
this.nbr = nbr;
this.from = from;
this.parserModel = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
this.tp = new TextProcessor();
this.wm = new WriteMachine(to);
public String parser(String toBeParsed) {
Tree parse;
if (toBeParsed == null) {
toBeParsed = "This is a sentence.";
TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer.factory(
new CoreLabelTokenFactory(), "");
Tokenizer<CoreLabel> tok = tokenizerFactory
.getTokenizer(new StringReader(toBeParsed));
List<CoreLabel> raw = tok.tokenize();
parse = lp.apply(raw);
TreebankLanguagePack tlp = lp.treebankLanguagePack(); // PennTreebankLanguagePack
// for English
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
return tdl.toString();
private void parseFromTo() {
ArrayList<String> proc = null;
proc = rwm.readFile(from);
if (proc != null && proc.size() > 0) {
proc = tp.sentenceDivider(proc);
for (String line : proc) {
public void run() {
while (true) {
WANT this method to execute in peace
try {
Thread.sleep(3 * 1000);
} catch (InterruptedException e) {
logger.info("Sleep interupted");
package crawler;
import java.io.OutputStream;
import java.util.Set;
import java.util.regex.Pattern;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.parser.HtmlParseData;
import edu.uci.ics.crawler4j.url.WebURL;
public class Crawler extends WebCrawler {
private static final Logger logger = LoggerFactory.getLogger(Crawler.class);
public static String newline = System.getProperty("line.separator");
private int index;
private static int indx = 0;
private final static Pattern FILTERS = Pattern
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz|txt))$");
private String[] patterns = { "[Mm][Ii][Gg]", "[Mm][Aa][Gg]",
"[Gg][Mm][Aa][Ww]", "[Ww][Ee][Ll][Dd][Ii][Nn][Gg]" };
private String path = "/Users/aloefqvi/Dropbox/1337_Haxor/LTH/courses/EDAN70/parse_files/from/text/"
+ "from" + Integer.toString(this.index = indx++) + ".txt";
private Downloader dl = new Downloader(patterns, path);
protected void setOs(OutputStream os) {
* You should implement this function to specify whether the given url
* should be crawled or not (based on your crawling logic).
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches();
public void visit(Page page) {
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
Set<WebURL> links = htmlParseData.getOutgoingUrls();
String urlData = "";
for (WebURL item : links) {
urlData = urlData + newline + item;
WANT this thread to wait here until parseToFile finishes
dl.download(text.trim(), false);
答案 0 :(得分:0)
我想file locking为你做这件事 但是,为了在进程中的线程之间进行通信,存在更好的方法。 见Queues
答案 1 :(得分:0)
这样做的一种方法是使用Lock(具体而言,通常是ReentrantLock)。实质上,您创建一个Lock对象来管理对该文件的访问。然后,在对文件执行读取或写入之前,调用lock() - 如果锁定(因此文件)已被另一个线程使用,则会等待。一旦给定的线程完成读/写操作,它就会调用unlock()来释放锁,然后另一个等待的线程将立即获取。
答案 2 :(得分:0)
一些同步将解决您的问题,但如果可以,它会很棒 改变你的方法并切换到最新的java并发实用程序,从java 1.5我们有BlockingQueue可用。 例如 - &gt;