我有2个文本文件:
File1 - 此文件的格式为user_id tweet_id tweet_text
文件1
60730027 6298443824 thank you echo park. you've changed A LOT, but as long as I'm getting paid to make you move, I'm still with it! 2009-12-03 02:54:10
60730027 6297282530 fat Albert Einstein goin in right now over here!!! 2009-12-03 01:35:22
文件2
此文件的格式为genome_id name ascii_name
4045417 Southwest Indent Southwest Indent
4045418 Southeast Point Southeast Point
以下是读取文件1的代码段:
public void readfromFile() throws FileNotFoundException {
Scanner inputStream;
String source=null;
FileInputStream file = new FileInputStream("file1.txt");
String regex = "/[a-zA-Z ]+/";
Scanner fileScan = new Scanner(file);
while(fileScan.hasNextLine()){
word = fileScan.nextLine();
word = word.replaceAll(regex, "").toLowerCase();
PrintWriter outputStreamName = new PrintWriter(new FileOutputStream("temp.txt"));
outputStreamName.printf("%s",word);
}
我的目的是首先用user_id替换user_id,tweet_id,genome_id中存在的数据。然后将大写值转换为小写。但是,现在只要此代码处理file1,就不会对文本文件进行任何更改。我也想知道发生了什么。当我将其输出到控制台时,我得到输出。
预期产出:
thank you echo park youve changed a lot but as long as im getting paid to make you move im still with it
fat albert einstein goin in right now over here
答案 0 :(得分:1)
根据预期输出,您希望替换单词之间的字母,点和空格以外的所有内容。
[^a-zA-Z. ]+|(?<=\d)\s*(?=\d)|(?<=\D)\s*(?=\d)|(?<=\d)\s*(?=\D)
或者尝试没有Lookaround
[^a-zA-Z. ]+|\d\s+\d|\D\s+\d|\d\s+\D
此处\s
匹配任何空格字符[\r\n\t\f ]
示例代码:
String regex = "[^a-zA-Z. ]+|(?<=\\d)\\s*(?=\\d)|(?<=\\D)\\s*(?=\\d)|(?<=\\d)\\s*(?=\\D)";
str.replaceAll(regex,"");
输出:
thank you echo park. youve changed A LOT but as long as Im getting paid to make you move Im still with it
fat Albert Einstein goin in right now over here
要从输出中排除'
使用[^a-zA-Z.' ]+
,否则I'm
和you've
会更改为Im
和youve
。
更好使用[a-zA-Z']+
来获取所有单词。这是demo
示例代码:
String str = "60730027 6297282530 fat Albert Einstein goin in right now over here!!! 2009-12-03 01:35:22 ";
Pattern p = Pattern.compile("[a-zA-Z']+");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.print(m.group()+" ");
}
输出:
fat Albert Einstein goin in right now over here
注意:您正在检查下一行
变化:
source = inputStream.next();
要:
source = inputStream.nextLine();
答案 1 :(得分:0)
public void readfromFile() throws Exception
{
FileInputStream file = new FileInputStream("file1.txt");
StringBuilder builder = new StringBuilder();
int ch;
while((ch = file.read()) != -1){
builder.append((char)ch);
}
System.out.println(builder.toString().replaceAll("[^a-zA-Z\\s]", ""));
}
扫描仪过滤空字符串。
前者
Scanner scanner = new Scanner("60730027 6298443824 thank");
while(scanner.hasNext()) //Read from file till the last line of the file.
{
System.out.print(scanner.next());
}
输出
607300276298443824thank
所以我们不能使用扫描仪。
答案 2 :(得分:0)
试试这个
s = s.replaceAll("\\d+\\s+\\d+\\s+", "").replaceAll(" +\\S+ \\S+$", "");