我有一个服务,它返回以下格式的数据。我已将其缩短以便理解,但总的来说这是一个非常大的反应。格式总是一样的。
process=true
version=2
DataCenter=dc2
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}
Machine:[machineA.dc2.com, machineB.dc2.com]
DataCenter=dc1
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}
mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}
Machine:[machineP.dc1.com, machineQ.dc1.com]
DataCenter=dc3
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}
Machine:[machineO.dc3.com, machineR.dc3.com]
我正在尝试解析上述数据并将其存储在三个不同的地图中。
Map<String, Map<Integer, Integer>> prime = new HashMap<String, Map<Integer, Integer>>();
Map<String, Map<Integer, Integer>> obvious = new HashMap<String, Map<Integer, Integer>>();
Map<String, Map<Integer, String>> mapping = new HashMap<String, Map<Integer, String>>();
以下是说明:
dc2
,值为{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
。dc2
,值为{0=6, 1=7, 2=8, 3=5, 4=6}
。dc2
,值为{3=machineA.dc2.com, 2=machineB.dc2.com}
。同样适用于其他数据中心。
解析上述字符串响应的最佳方法是什么?我应该在这里使用正则表达式还是简单的字符串解析?
public class DataParser {
public static void main(String[] args) {
String response = getDataFromURL();
// here response will contain above string
parseResponse(response);
}
private void parseResponse(final String response) {
// what is the best way to parse the response?
}
}
任何例子都会有很大的帮助。
答案 0 :(得分:1)
您可以像ShellFish一样推荐并通过&#39; \ n&#39;然后处理每一行。
一种正则表达式方法如下(它不完整,但足以让你入门):
public static void main(String[] args) throws Exception {
String response = "process=true\n" +
"version=2\n" +
"DataCenter=dc2\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
" mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}\n" +
" Machine:[machineA.dc2.com, machineB.dc2.com]\n" +
"DataCenter=dc1\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}\n" +
" mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}\n" +
" Machine:[machineP.dc1.com, machineQ.dc1.com]\n" +
"DataCenter=dc3\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
" mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}\n" +
" Machine:[machineO.dc3.com, machineR.dc3.com]";
Map<String, Map<Integer, Integer>> prime = new HashMap();
Map<String, Map<Integer, Integer>> obvious = new HashMap();
Map<String, Map<Integer, String>> mapping = new HashMap();
String outerMapKey = "";
int findCount = 0;
Matcher matcher = Pattern.compile("(?<=DataCenter=)(.*)|(?<=prime:)(.*)|(?<=obvious:)(.*)|(?<=mapping:)(.*)").matcher(response);
while(matcher.find()) {
switch (findCount) {
case 0:
outerMapKey = matcher.group();
break;
case 1:
prime.put(outerMapKey, new HashMap());
String group = matcher.group().replaceAll("[\\{\\}]", "").replaceAll(", ", ",");
String[] groupPieces = group.split(",");
for (String groupPiece : groupPieces) {
String[] keyValue = groupPiece.split("=");
prime.get(outerMapKey).put(Integer.parseInt(keyValue[0]), Integer.parseInt(keyValue[0]));
}
break;
// Add additional cases for obvious and mapping
}
findCount++;
if (findCount == 4) {
findCount = 0;
}
}
System.out.println("Primes:");
prime.keySet().stream().forEach(k -> System.out.printf("Key: %s Value: %s\n", k, prime.get(k)));
// Add additional outputs for obvious and mapping
}
结果:
Primes:
Key: dc2 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
Key: dc1 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6}
Key: dc3 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
参考解释正则表达式模式: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
答案 1 :(得分:1)
答案取决于您对格式的确定程度和确切程度。一种非常简单的方法解析字符串并进行最小字符串比较以确定键值:
private static Map<Integer,Integer> str2map( String str ){
Map<Integer,Integer> map = new HashMap<>();
str = str.substring( 1, str.length()-1 );
String[] pairs = str.split( ", " );
for( String pair: pairs ){
String[] kv = pair.split( "=" );
map.put( Integer.parseInt(kv[0]),Integer.parseInt(kv[1]) );
}
return map;
}
private static Map<Integer,String> str2mapis( String str ){
Map<Integer,String> map = new HashMap<>();
//...
map.put( Integer.parseInt(kv[0]),kv[1] );
}
return map;
}
更明确的nextLine()调用甚至可以避免测试“DataCenter”。
这里有几个几乎相同的方法来分割大括号并创建一个地图:
private static final String PRIME = "prime:";
// ...
prime.put( dc, str2map(scanner.nextLine().trim().substring( PRIME_LEN )) );
如果空白区域有可能发生变化,您可以使用
保持安全line = scanner.nextLine().trim();
if( line.startsWith( PRIME ) ){
prime.put( dc, str2map(scanner.nextLine().substring( PRIME_LEN )) );
}
如果无法保证线条的顺序或完整性,则可能需要进行测试:
exec('convert result.jpg -deskew 40 -format "%[deskew:angle]" info:', $diskew, $ret_var );
如果稳定性/信任度更低,则可能会显示正则表达式解析。
答案 2 :(得分:0)
在这种情况下,我会做简单的字符串解析,为每一行应用regex。在伪代码中,像这样:
for line in response
if line matches /^DataCenter/
key = datacenter name
else if line matches / *prime/
prime.put(key, prime value)
else if line matches / *obvious/
obvious.put(key, obvious value)
else if line matches / *mapping/
mapping.put(key, mapping value)
else
getline
您可以通过首先检查该行的第一个字符来优化此处。如果除了空格或D
之外的任何内容,您可以转到下一行。如果格式始终相同,您甚至可以对要解析的行进行硬编码。在您提供的示例中,您可以执行以下操作:
skip 2 lines
repeat
extract datacenter name
skip 1 line
extract prime
extract obvious
extract mapping
add above stuff to the maps
skip 1 line
until EOF
这会快得多,但如果格式改变则会失败。
答案 3 :(得分:0)
您可以使用诸如ANTLR的解析器生成器,或者您可以手动编写解析器代码。根据您需要处理的输出量和频率,您可能会发现遇到这样的麻烦并不值得,并且只需遍历每一行并手动解析它(例如,正则表达式或indexOf)就足够了够了。