我有一个文本文件,每行看起来像这样: (电影评论数据库)
product/productId: B00004CK40 review/userId: A39IIHQF18YGZA review/profileName: C. A. M. Salas review/helpfulness: 0/0 review/score: 4.0 review/time: 1175817600 review/summary: Reliable comedy review/text: Nice script, well acted comedy, and a young Nicolette Sheridan. Cusak is in top form.
我想解析这个文件以便检索:
此信息稍后将使用MovieReview
&封装。 Movie
上课。
public class MovieReview {
private Movie movie;
private String userId;
private String profileName;
private String helpfulness;
private Date timestamp;
private String summary;
private String review;
...
任何人都可以提供适当的&解析此文件(大型数据集)的有效方法?
感谢。
答案 0 :(得分:3)
如果它是一个大型数据集,您将要避免一次将整个列表加载到内存中。我可能用每行的处理程序来解决这个问题
public class MovieReviewParser {
public void parse(BufferedReader reader, MovieReviewHandler handler) {
Pattern regex = Pattern.compile("product/productId:(.*)review/userId:(.*)review/profileName:(.*)"); // add other fields
String line;
while ((line = reader.readLine()) != null) {
Matcher matcher = regex.matcher(line);
if (!matcher.matches()) throw new RuntimeException();
MovieReview review = new MovieReview();
review.productId = matcher.group(1);
review.userId = matcher.group(2);
review.profileName = matcher.group(3);
// etc
handler.handle(review);
}
}
}
然后你可以解析如下:
{{1}}