Extract file extension from String using regex

时间:2018-03-09 19:14:26

标签: java regex

I have the following String:

"data:audio/mp3;base64,ABC..."

And I'm extracting the file extension (in this case "mp3") out of it.

The String varies accordingly to the file type. Some examples:

"..."
"..."
"data:audio/wav;base64,ABC..."
"data:audio/mp3;base64,ABC..."

Here's how I've done:

public class Test {

    private static final String BASE64_HEADER_EXP = "^data:.+;base64,";

    private static final Pattern PATTERN_BASE64_HEADER = Pattern.compile(BASE64_HEADER_EXP);

    private String data;

    private String fileName;

    public String getFileName() {
        Matcher base64HeaderMatcher = PATTERN_BASE64_HEADER.matcher(data);
        return String.format("%s.%s", getFilenameWithoutExtension(), getExtension(base64HeaderMatcher));
    }

    private String getFilenameWithoutExtension() {
        return fileName.split("\\.")[0];
    }

    private String getExtension(Matcher base64HeaderMatcher) {
        if (base64HeaderMatcher.find()) {
            String base64Header = base64HeaderMatcher.group(0);
            return base64Header.split("/")[1].split(";")[0];
        }
        return fileName.split("\\.")[1];
    }

}

What I want is a way to do it without having to split and access array positions like I'm doing above. Maybe extract the extension using a regex expression.

I'm able to do it on RegExr site using this expression:

(?<=^data:.*/)(.*)(?=;)

But, when trying to use the same regex on Java, I get the error "Require that the characters immediately before the position do" because, aparently, Java doesn't support repetition inside lookbehind:

enter image description here

3 个答案:

答案 0 :(得分:2)

How about using capturing groups?

private static final String BASE64_HEADER_EXP = "^data:[^/]+/([^;]+);base64,";

This way you can use base64HeaderMatcher.group(1) and get file type.

答案 1 :(得分:0)

This should do it for the examples you gave:

(?<=data:)(?:[A-z]+)/(.*?);

Explanation:

Positive look-behind

(?<=data:)

Non-capturing group to account for image, audio, etc.

(?:[A-z]+)

Match / literally, capture group for file extension, match ; literally

/(.*?);

答案 2 :(得分:0)

&#34; Java中的字符串内置了对正则表达式的支持。字符串有四种用于正则表达式的内置方法,即matches(),split()),replaceFirst()和replaceAll()方法。&#34; - http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

使用此信息我们可以快速制作正则表达式并对照我们的字符串进行测试。

//In regex each set of () represents a capture field which can later be 
//referenced with $1, $2 etc..
//The below regex breaks the string into four fields 

string pattern="(^data:)(\\w+?/)(\\w+?)(;.*$)";

//First Field
//This field matches the start of a line (^) followed by "data:"

//Second Field
//This matches any wordCharacter (\\w), one or more (+) followed by a "/"
// the "?" symbol after the + means reluctantly match, match as few 
//characters 
//as possible. this field will effectively capture a seriece of letters 
//followed by a slash

//Third Field
//This is the field we want to capture and we will reference with $3
//it matches any wordCharacter(\\w), one or more reluctantly

//Fourth Field
//This captures the rest of the string including the ";"


//Now to extract the extension from this test string

string test="...";
string testExtension="";

//Replace the contents of testExtension with the 3rd capture field of 
//our regex pattern applied to our test string like so

testExtension = test.replaceAll(pattern, "$3");

//This invokes the String class replaceAll() method 

//And now our string testExtension should contain "jpeg"