获取长字符串中分配给变量的值

时间:2018-10-15 04:26:15

标签: java search

我有一个从数据库传递来的字符串,该数据库本质上是电子邮件正文,并且包含类似波纹管的内容:

  

Content-Type:应用程序/ pdf; name =“ mozilla.pdf”内容描述:   mozilla.pdf内容处置:附件; filename =“ mozilla.pdf”;   大小= 92442; creation-date =“ Fri,12 Oct 2018 14:14:00 GMT”;   modify-date =“星期五,2018年10月12日14:14:00   GMT“内容传输编码:base64”

我希望能够获得文件名 Content-Type 等。

例如:在上面的文本中,文件名将为 mozilla.pdf

3 个答案:

答案 0 :(得分:1)

由于输入字符串中没有修复模式,因此您必须编写自己的解析器,或者可以使用不同的正则表达式来获取不同的参数。例如要获取filename,可以使用:

final String regex = "filename=\"(.*?)\";";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(<input-string>);

if (matcher.find()) {
    System.out.println("Filename: " + matcher.group(1));
}

答案 1 :(得分:1)

首先从字符串中删除您的";,其次,将它们按您想要检索的所有术语进行拆分,例如filenamesize等。然后,循环遍历新的Array,并用:=进行拆分。最后,只需将它们放在HashMap中,以便像这样map.get("filename")来检索它们。请参阅下面的解决方案。

编辑:当您要求创建一个ArrayList<String>以收集同一键下的所有值时,我对其进行了如下更新。

注意:为了不将filenamename分开,我在name上加了一个空格作为术语。

String string = "Content-Type: application/pdf; name=\"mozilla.pdf\" name=\"mozilla2.pdf\" name=\"mozilla3.pdf\" Content-Description: mozilla.pdf Content-Disposition: attachment; filename=\"mozilla.pdf\"; size=92442; creation-date=\"Fri, 12 Oct 2018 14:14:00 GMT\"; modification-date=\"Fri, 12 Oct 2018 14:14:00 GMT\"Content-Transfer-Encoding: base64";
string = string.replaceAll("[\";]", "");
String[] parts = string.split("(?=(Content-Type)|( name)|(Content-Description)|(Content-Disposition)|(filename)|(size)|(creation-date)|(modification-date)|(Content-Transfer-Encoding))");
Map<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
for (String part : parts) {
  String[] keyValue = part.split("[:=]");
  String key = keyValue[0].trim();
  String value = keyValue[1].trim();
  ArrayList<String> list;
  if(map.containsKey(key)){
    list = map.get(key);
    list.add(value);
  } else {
    list = new ArrayList<String>();
    list.add(value);
    map.put(key, list);
  }
}
System.out.println(map.get("name"));
System.out.println(map.get("Content-Type"));
System.out.println(map.get("filename"));
System.out.println(map.get("creation-date"));
System.out.println(map.get("size"));

输出

[mozilla.pdf, mozilla2.pdf, mozilla3.pdf]
[application/pdf]
[mozilla.pdf]
[Fri, 12 Oct 2018 14]
[92442]

答案 2 :(得分:1)

如果您已经知道主字符串的基本格式和内容样式,则可以使用自定义子字符串检索方法来获取所需的数据。我在下面提供的方法使您可以检索其他两个子字符串之间包含的子字符串,例如:

如果要检索与子字符串“ filename =“(当然是“ mozilla.pdf”)相关的文件名,则可以为该方法提供一个"filename=\""的Left-String和一个"\""的右字符串。

该方法返回一维String数组,该数组将在提供的Left和Right子字符串之间可能存在子字符串的任何情况下出现,因此对于上面的示例,我们将这样调用该方法:

String inputString = "Content-Type: application/pdf; name=\"mozilla.pdf\" "
                   + "Content-Description: mozilla.pdf Content-Disposition: attachment; "
                   + "filename=\"mozilla.pdf\"; size=92442; creation-date=\""
                   + "Fri, 12 Oct 2018 14:14:00 GMT\"; modification-date=\""
                   + "Fri, 12 Oct 2018 14:14:00 GMT\"Content-Transfer-Encoding: base64";

String[] fileNames = getSubstring(inputString,"filename=\"", "\"");

for (int i = 0; i < fileNames.length; i++) {
    System.out.println("File Name " + (i+1) + ":\t" + fileNames[i]);
}

这最终会将在主输入字符串中找到的所有文件名打印到控制台窗口。如果只需要文件名的第一个实例,则可以在方法调用的末尾放置一个索引值以检索所需的文件名,例如:

String fileName = getSubstring(inputString,"filename=\"", "\"")[0];
System.out.println("File Name:\t" + fileName);

这将打印:File Name: mozilla.pdf到控制台窗口。

这是方法:

/**
 * Retrieves any string data located between the supplied string leftString
 * parameter and the supplied string rightString parameter.<br><br>
 * 
 * It can also retrieve a substring located at the beginning or the end of 
 * the main input string (see: leftString and rightString parameter information).
 * 
 * <p>
 * This method will return all instances of a substring located between the
 * supplied Left String and the supplied Right String which may be found
 * within the supplied Input String.<br>
 *
 * @param inputString (String) The string to look for substring(s) in.
 *
 * @param leftString  (String) What may be to the Left side of the substring
 *                    we want within the main input string. Sometimes the 
 *                    substring you want may be contained at the very beginning
 *                    of a string and therefore there is no Left-String available. 
 *                    In this case you would simply pass a Null String ("") to 
 *                    this parameter which basically informs the method of this 
 *                    fact. Null can not be supplied and will ultimately generate
 *                    a NullPointerException. If a Null String ("") is supplied
 *                    then the rightString parameter <b>must</b> contain a String.
 *
 * @param rightString (String) What may be to the Right side of the
 *                    substring we want within the main input string. 
 *                    Sometimes the substring you want may be contained
 *                    at the very end of a string and therefore there is
 *                    no Right-String available. In this case you would 
 *                    simply pass a Null String ("") to this parameter
 *                    which basically informs the method of this fact.
 *                    Null can not be supplied and will ultimately generate
 *                    a NullPointerException. If a Null String ("") is supplied
 *                    then the leftString parameter <b>must</b> contain a String.
 * 
 * @param options     (Optional - Boolean - 2 Parameters):<pre>
 *
 *      ignoreLetterCase    - Default is false. This option works against the
 *                            string supplied within the leftString parameter
 *                            and the string supplied within the rightString
 *                            parameter. If set to true then letter case is
 *                            ignored when searching for strings supplied in
 *                            these two parameters. If left at default false
 *                            then letter case is not ignored. 
 *
 *      trimFound           - Default is true. By default this method will trim
 *                            off leading and trailing white-spaces from found
 *                            sub-string items. General sentences which obviously
 *                            contain spaces will almost always give you a white-
 *                            space within an extracted sub-string. By setting
 *                            this parameter to false, leading and trailing white-
 *                            spaces are not trimmed off before they are placed
 *                            into the returned Array.</pre>
 *
 * @return (1D String Array) Returns a Single Dimensional String Array
 *         containing all the sub-strings found within the supplied Input
 *         String which are between the supplied Left String and supplied
 *         Right String. Returns Null if nothing is found.
 * 
 *         You can shorten this method up a little by returning a List&lt;String&gt; 
 *         ArrayList and removing the 'List to 1D Array' conversion code at 
 *         the end of this method. This method initially stores its findings 
 *         within a List Interface object anyways.
 */
public static String[] getSubstring(String inputString, String leftString, String rightString, boolean... options) {
    // Return nothing if nothing was supplied.
    if (inputString.equals("") || (leftString.equals("") && rightString.equals(""))) {
        return null;
    }

    // Prepare optional parameters if any supplied.
    // If none supplied then use Defaults...
    boolean ignoreCase = false; // Default.
    boolean trimFound = true;   // Default.
    if (options.length > 0) {
        if (options.length >= 1) {
            ignoreCase = options[0];
        }
        if (options.length >= 2) {
            trimFound = options[1];
        }
    }

    // Remove any ASCII control characters from the
    // supplied string (if they exist).
    String modString = inputString.replaceAll("\\p{Cntrl}", "");

    // Establish a List String Array Object to hold
    // our found substrings between the supplied Left
    // String and supplied Right String.
    List<String> list = new ArrayList<>();

    // Use Pattern Matching to locate our possible
    // substrings within the supplied Input String.
    String regEx = Pattern.quote(leftString) + 
                   (!rightString.equals("") ? "(.*?)" : "(.*)?") + 
                   Pattern.quote(rightString);
    if (ignoreCase) {
        regEx = "(?i)" + regEx;
    }
    Pattern pattern = Pattern.compile(regEx);
    Matcher matcher = pattern.matcher(modString);
    while (matcher.find()) {
        // Add the found substrings into the List.
        String found = matcher.group(1);
        if (trimFound) {
            found = found.trim();
        }
        list.add(found);
    }

    String[] res;
    // Convert the ArrayList to a 1D String Array.
    // If the List contains something then convert
    if (list.size() > 0) {
        res = new String[list.size()];
        res = list.toArray(res);
    } // Otherwise return Null.
    else {
        res = null;
    }
    // Return the String Array.
    return res;
}

要检索提供的字符串中包含的数据:

System.out.println("Content-Type:\t\t\t" + getSubstring(inputString,"Content-Type:", ";")[0]);
System.out.println("Name:\t\t\t\t" + getSubstring(inputString,"name=\"", "\"")[0]);
System.out.println("Content-Description:\t\t" + getSubstring(inputString,"Content-Description:", "Content-Disposition:")[0]);
System.out.println("Content-Disposition:\t\t" + getSubstring(inputString,"Content-Disposition:", ";")[0]);
System.out.println("File Name:\t\t\t" + getSubstring(inputString,"filename=\"", "\"")[0]);
System.out.println("File Size:\t\t\t" + getSubstring(inputString,"size=", ";")[0]);
System.out.println("Creation Date:\t\t\t" + getSubstring(inputString,"creation-date=\"", "\";")[0]);
System.out.println("Modification Date:\t\t" + getSubstring(inputString,"modification-date=\"", "\"")[0]);
System.out.println("Content Transfer Encoding\t" + getSubstring(inputString,"Content-Transfer-Encoding:", "")[0]);

控制台输出为:

Content-Type:               application/pdf
Name:                       mozilla.pdf
Content-Description:        mozilla.pdf
Content-Disposition:        attachment
File Name:                  mozilla.pdf
File Size:                  92442
Creation Date:              Fri, 12 Oct 2018 14:14:00 GMT
Modification Date:          Fri, 12 Oct 2018 14:14:00 GMT
Content Transfer Encoding   base64