我有一个名为something.json的巨大JSON文件。该文件是20 MB。我正在用GSON读这个。它正在标准的Android Nexus 5X上阅读。
Json的例子:
[
{"country":"UA","name":"Hurzuf","_id":707860,"coord":{"lon":34.283333,"lat":44.549999}},
{"country":"UA","name":"Il’ichëvka","_id":707716,"coord":{"lon":34.383331,"lat":44.666668}},
{"country":"BG","name":"Rastnik","_id":727762,"coord":{"lon":25.283331,"lat":41.400002}}
...
]
代码:
@Override
protected ArrayList<City> doInBackground(File... files) {
ArrayList<City> cities = new ArrayList<>();
try {
InputStream is = new FileInputStream(files[0]);
JsonReader reader = new JsonReader(new InputStreamReader(is, "UTF-8"));
reader.beginArray();
while (reader.hasNext()) {
City city = new Gson().fromJson(reader, City.class);
cities.add(city);
}
reader.endArray();
reader.close();
} catch (Exception e) {
mResult.onFinish(cities, e.getMessage());
}
Collections.sort(cities, (o1, o2) -> o1.getName().compareTo(o2.getName()));
mResult.onFinish(cities, CityService.SUCCESS);
return cities;
}
使用的图书馆:
com.google.code.gson:gson:2.8.0
需要使用 Android API 16 直到最新版本。
我需要将此内容读入mCities,并按城市名称的字母顺序对其进行排序。现在这需要3分钟,必须在大约10秒钟内完成。我的方法是将10个较小的块中的json文件剪切掉,读取它们,连接并对它们进行排序。
所以我的问题是:如何将文件分成较小的块,这是解决这个问题的正确方法吗?
答案 0 :(得分:1)
我本身从不做Android编码,但是我有一些注意事项,可能还有你的想法,因为这是纯Java 。
您的读者在阅读每个元素时会做非常过度的工作。
首先,您不需要在每次需要时创建Gson
:
Gson
实例也会在执行更多时间的情况下命中堆,然后进行垃圾收集。接下来,Gson中只有反序列化和JSON流读取之间存在差异:第一种可能在引擎盖下使用重型适配器组合,而后者只能通过令牌解析JSON文档令牌。 话虽如此,您可以在阅读JSON流时获得更好的性能:您的JSON文件确实具有非常严格的结构,因此可以更简单地实现高级解析器。
假设一个简单的测试套件,为您的问题提供不同的实现:
final class City {
@SerializedName("_id")
final int id;
@SerializedName("country")
final String country;
@SerializedName("name")
final String name;
@SerializedName("coord")
final Coordinates coordinates;
private City(final int id, final String country, final String name, final Coordinates coordinates) {
this.id = id;
this.country = country;
this.name = name;
this.coordinates = coordinates;
}
static City of(final int id, final String country, final String name, final Coordinates coordinates) {
return new City(id, country, name, coordinates);
}
@Override
public boolean equals(final Object o) {
if ( this == o ) {
return true;
}
if ( o == null || getClass() != o.getClass() ) {
return false;
}
final City that = (City) o;
return id == that.id;
}
@Override
public int hashCode() {
return id;
}
@SuppressWarnings("ConstantConditions")
public static int compareByName(final City city1, final City city2) {
return city1.name.compareTo(city2.name);
}
}
final class Coordinates {
@SerializedName("lat")
final double latitude;
@SerializedName("lon")
final double longitude;
private Coordinates(final double latitude, final double longitude) {
this.latitude = latitude;
this.longitude = longitude;
}
static Coordinates of(final double latitude, final double longitude) {
return new Coordinates(latitude, longitude);
}
@Override
public boolean equals(final Object o) {
if ( this == o ) {
return true;
}
if ( o == null || getClass() != o.getClass() ) {
return false;
}
final Coordinates that = (Coordinates) o;
return Double.compare(that.latitude, latitude) == 0
&& Double.compare(that.longitude, longitude) == 0;
}
@Override
public int hashCode() {
final long latitudeBits = Double.doubleToLongBits(latitude);
final long longitudeBits = Double.doubleToLongBits(longitude);
final int latitudeHash = (int) (latitudeBits ^ latitudeBits >>> 32);
final int longitudeHash = (int) (longitudeBits ^ longitudeBits >>> 32);
return 31 * latitudeHash + longitudeHash;
}
}
interface ITest {
@Nonnull
default String getName() {
return getClass().getSimpleName();
}
@Nonnull
Collection<City> test(@Nonnull JsonReader jsonReader)
throws IOException;
}
public static void main(final String... args)
throws IOException {
final Iterable<ITest> tests = ImmutableList.of(
FirstTest.get(),
ReadAsWholeListTest.get(),
ReadAsWholeTreeSetTest.get(),
ReadJsonStreamIntoListTest.get(),
ReadJsonStreamIntoTreeSetTest.get(),
ReadJsonStreamIntoListChunksTest.get()
);
for ( int i = 0; i < 3; i++ ) {
for ( final ITest test : tests ) {
try ( final ZipInputStream zipInputStream = new ZipInputStream(Resources.getPackageResourceInputStream(Q49273660.class, "cities.json.zip")) ) {
for ( ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry() ) {
if ( zipEntry.getName().equals("cities.json") ) {
final JsonReader jsonReader = new JsonReader(new InputStreamReader(zipInputStream)); // do not close
System.out.printf("%1$35s : ", test.getName());
final Stopwatch stopwatch = Stopwatch.createStarted();
final Collection<City> cities = test.test(jsonReader);
System.out.printf("in %d ms with %d elements\n", stopwatch.elapsed(TimeUnit.MILLISECONDS), cities.size());
assertSorted(cities, City::compareByName);
}
}
}
}
System.out.println("--------------------");
}
}
private static <E> void assertSorted(final Iterable<? extends E> iterable, final Comparator<? super E> comparator) {
final Iterator<? extends E> iterator = iterable.iterator();
if ( !iterator.hasNext() ) {
return;
}
E a = iterator.next();
if ( !iterator.hasNext() ) {
return;
}
do {
final E b = iterator.next();
if ( comparator.compare(a, b) > 0 ) {
throw new AssertionError(a + " " + b);
}
a = b;
} while ( iterator.hasNext() );
}
这是最慢的一个。 而且它只是将您的问题改编为测试。
final class FirstTest
implements ITest {
private static final ITest instance = new FirstTest();
private FirstTest() {
}
static ITest get() {
return instance;
}
@Nonnull
@Override
public List<City> test(@Nonnull final JsonReader jsonReader)
throws IOException {
jsonReader.beginArray();
final List<City> cities = new ArrayList<>();
while ( jsonReader.hasNext() ) {
final City city = new Gson().fromJson(jsonReader, City.class);
cities.add(city);
}
jsonReader.endArray();
cities.sort(City::compareByName);
return cities;
}
}
这很可能是你如何实现它的。 它不是赢家,但它是最简单的,它使用默认排序。
final class ReadAsWholeListTest
implements ITest {
private static final ITest instance = new ReadAsWholeListTest();
private ReadAsWholeListTest() {
}
static ITest get() {
return instance;
}
private static final Gson gson = new Gson();
private static final Type citiesListType = new TypeToken<List<City>>() {
}.getType();
@Nonnull
@Override
public List<City> test(@Nonnull final JsonReader jsonReader) {
final List<City> cities = gson.fromJson(jsonReader, citiesListType);
cities.sort(City::compareByName);
return cities;
}
}
如果你没有绑定到列表,另一个想法是使用已经排序的集合,如TreeSet
。
由于我不知道是否有方法在TreeSet
中指定新的Gson
比较器机制,因此它必须使用自定义类型的适配器工厂(但如果{{{} {} 1}}已经按名称进行比较,但它不灵活。)
City
以下课程是一种特殊的读者测试,它使用简化的城市JSON阅读策略。
它可能是最简单的(就JSON结构分析而言),它要求JSON文档非常严格。
final class ReadAsWholeTreeSetTest
implements ITest {
private static final ITest instance = new ReadAsWholeTreeSetTest();
private ReadAsWholeTreeSetTest() {
}
static ITest get() {
return instance;
}
@SuppressWarnings({ "rawtypes", "unchecked" })
private static final TypeToken<TreeSet<?>> rawTreeSetType = (TypeToken) TypeToken.get(TreeSet.class);
private static final Map<Type, Comparator<?>> comparatorsRegistry = ImmutableMap.of(
City.class, (Comparator<City>) City::compareByName
);
private static final Gson gson = new GsonBuilder()
.registerTypeAdapterFactory(new TypeAdapterFactory() {
@Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
if ( !TreeSet.class.isAssignableFrom(typeToken.getRawType()) ) {
return null;
}
final Type elementType = ((ParameterizedType) typeToken.getType()).getActualTypeArguments()[0];
@SuppressWarnings({ "rawtypes", "unchecked" })
final Comparator<Object> comparator = (Comparator) comparatorsRegistry.get(elementType);
if ( comparator == null ) {
return null;
}
final TypeAdapter<TreeSet<?>> originalTreeSetTypeAdapter = gson.getDelegateAdapter(this, rawTreeSetType);
final TypeAdapter<?> originalElementTypeAdapter = gson.getDelegateAdapter(this, TypeToken.get(elementType));
final TypeAdapter<TreeSet<Object>> treeSetTypeAdapter = new TypeAdapter<TreeSet<Object>>() {
@Override
public void write(final JsonWriter jsonWriter, final TreeSet<Object> treeSet)
throws IOException {
originalTreeSetTypeAdapter.write(jsonWriter, treeSet);
}
@Override
public TreeSet<Object> read(final JsonReader jsonReader)
throws IOException {
jsonReader.beginArray();
final TreeSet<Object> elements = new TreeSet<>(comparator);
while ( jsonReader.hasNext() ) {
final Object element = originalElementTypeAdapter.read(jsonReader);
elements.add(element);
}
return elements;
}
}.nullSafe();
@SuppressWarnings({ "rawtypes", "unchecked" })
final TypeAdapter<T> castTreeSetTypeAdapter = (TypeAdapter<T>) treeSetTypeAdapter;
return castTreeSetTypeAdapter;
}
})
.create();
private static final Type citiesSetType = new TypeToken<TreeSet<City>>() {
}.getType();
@Nonnull
@Override
public Set<City> test(@Nonnull final JsonReader jsonReader) {
return gson.fromJson(jsonReader, citiesSetType);
}
}
这个与abstract class AbstractJsonStreamTest
implements ITest {
protected static void read(final JsonReader jsonReader, final Consumer<? super City> cityConsumer)
throws IOException {
jsonReader.beginArray();
while ( jsonReader.hasNext() ) {
jsonReader.beginObject();
require(jsonReader, "country");
final String country = jsonReader.nextString();
require(jsonReader, "name");
final String name = jsonReader.nextString();
require(jsonReader, "_id");
final int id = jsonReader.nextInt();
require(jsonReader, "coord");
jsonReader.beginObject();
require(jsonReader, "lon");
final double longitude = jsonReader.nextDouble();
require(jsonReader, "lat");
final double latitude = jsonReader.nextDouble();
jsonReader.endObject();
jsonReader.endObject();
final City city = City.of(id, country, name, Coordinates.of(latitude, longitude));
cityConsumer.accept(city);
}
jsonReader.endArray();
}
private static void require(final JsonReader jsonReader, final String expectedName)
throws IOException {
final String actualName = jsonReader.nextName();
if ( !actualName.equals(expectedName) ) {
throw new JsonParseException("Expected " + expectedName + " but was " + actualName);
}
}
}
非常相似,但它使用简化的反序列化机制。
ReadAsWholeListTest
这个与前一个一样,也是更昂贵的实现(final class ReadJsonStreamIntoListTest
extends AbstractJsonStreamTest {
private static final ITest instance = new ReadJsonStreamIntoListTest();
private ReadJsonStreamIntoListTest() {
}
static ITest get() {
return instance;
}
@Nonnull
@Override
public Collection<City> test(@Nonnull final JsonReader jsonReader)
throws IOException {
final List<City> cities = new ArrayList<>();
read(jsonReader, cities::add);
cities.sort(City::compareByName);
return cities;
}
}
)的另一个实现,但它不需要自定义类型的adatpter。
ReadAsWholeTreeSetTest
以下测试基于您最初的想法,但它不会并行排序(我不确定,但您可以尝试一下)。 我仍然认为前两个更简单,可能更容易维护并提高性能。
final class ReadJsonStreamIntoTreeSetTest
extends AbstractJsonStreamTest {
private static final ITest instance = new ReadJsonStreamIntoTreeSetTest();
private ReadJsonStreamIntoTreeSetTest() {
}
static ITest get() {
return instance;
}
@Nonnull
@Override
public Collection<City> test(@Nonnull final JsonReader jsonReader)
throws IOException {
final Collection<City> cities = new TreeSet<>(City::compareByName);
read(jsonReader, cities::add);
return cities;
}
}
对于我的桌面 JRE,我可以获得以下测试结果:
final class ReadJsonStreamIntoListChunksTest
extends AbstractJsonStreamTest {
private static final ITest instance = new ReadJsonStreamIntoListChunksTest();
private ReadJsonStreamIntoListChunksTest() {
}
static ITest get() {
return instance;
}
@Nonnull
@Override
public List<City> test(@Nonnull final JsonReader jsonReader)
throws IOException {
final Collection<List<City>> cityChunks = new ArrayList<>();
final AtomicReference<List<City>> cityChunkRef = new AtomicReference<>(new ArrayList<>());
read(jsonReader, city -> {
final List<City> cityChunk = cityChunkRef.get();
cityChunk.add(city);
if ( cityChunk.size() >= 10000 ) {
cityChunks.add(cityChunk);
cityChunkRef.set(new ArrayList<>());
}
});
if ( !cityChunkRef.get().isEmpty() ) {
cityChunks.add(cityChunkRef.get());
}
for ( final List<City> cities : cityChunks ) {
Collections.sort(cities, City::compareByName);
}
return merge(cityChunks, City::compareByName);
}
/**
* <p>Adapted from:</p>
* <ul>
* <li>Original question: https://stackoverflow.com/questions/1774256/java-code-review-merge-sorted-lists-into-a-single-sorted-list</li>
* <li>Accepted answer: https://stackoverflow.com/questions/1774256/java-code-review-merge-sorted-lists-into-a-single-sorted-list/1775748#1775748</li>
* </ul>
*/
@SuppressWarnings("MethodCallInLoopCondition")
private static <E> List<E> merge(final Iterable<? extends List<E>> lists, final Comparator<? super E> comparator) {
int totalSize = 0;
for ( final List<E> l : lists ) {
totalSize += l.size();
}
final List<E> result = new ArrayList<>(totalSize);
while ( result.size() < totalSize ) { // while we still have something to add
List<E> lowest = null;
for ( final List<E> l : lists ) {
if ( !l.isEmpty() ) {
if ( lowest == null || comparator.compare(l.get(0), lowest.get(0)) <= 0 ) {
lowest = l;
}
}
}
assert lowest != null;
result.add(lowest.get(0));
lowest.remove(0);
}
return result;
}
}
正如您所看到的,创建过多的 FirstTest : in 5797 ms with 209557 elements
ReadAsWholeListTest : in 796 ms with 209557 elements
ReadAsWholeTreeSetTest : in 733 ms with 162006 elements
ReadJsonStreamIntoListTest : in 461 ms with 209557 elements
ReadJsonStreamIntoTreeSetTest : in 452 ms with 162006 elements
ReadJsonStreamIntoListChunksTest : in 607 ms with 209557 elements
--------------------
FirstTest : in 3396 ms with 209557 elements
ReadAsWholeListTest : in 493 ms with 209557 elements
ReadAsWholeTreeSetTest : in 520 ms with 162006 elements
ReadJsonStreamIntoListTest : in 385 ms with 209557 elements
ReadJsonStreamIntoTreeSetTest : in 377 ms with 162006 elements
ReadJsonStreamIntoListChunksTest : in 540 ms with 209557 elements
--------------------
FirstTest : in 3448 ms with 209557 elements
ReadAsWholeListTest : in 429 ms with 209557 elements
ReadAsWholeTreeSetTest : in 421 ms with 162006 elements
ReadJsonStreamIntoListTest : in 400 ms with 209557 elements
ReadJsonStreamIntoTreeSetTest : in 385 ms with 162006 elements
ReadJsonStreamIntoListChunksTest : in 480 ms with 209557 elements
--------------------
实例绝对是错误的想法。
更优化的测试可获得更好的性能。
但是,将大型列表拆分为以后要合并的已排序块(无并行)并不会在我的环境中提供太多的性能提升。
为简单而且可能是最佳选择,我会根据所需的集合使用Gson
。
我真的不确定它在真实的Android环境中有多好用,但你可以简单地做一些JSON反序列化比Gson使用它的内部结构好一些。
顺便说一下:
ReadJsonStreamInto_Collection_Test
是标识)。_id
的排序版,该怎么办?此外,如果我的上述假设正确,您可能希望过滤掉重复项。