我有两个带有图书ID的文件
- current.json [~10,000 lines] -> books saved in the system
- feed.json [~300,000 lines] -> feed file contents all books from a book store
从这2个文件我要生成3个文件
- not_available.json -> books exists in current but not in feed
- to_be_updated.json -> books exists in both current and feed
- new.json -> books exists only in the feed
因为文件太大我逐行读取文件,我无法将数据放入内存中
我的代码如下:
// export to_be_updated.json and new.json
feed <- initstream(feed.json)
while(lf <- feed.nextline())
found <- false;
current <- initstream(current.json)
while(lc <- current.nextline())
if(JSON.parse(lf).id == JSON.parse(lc).id)
found <- true
break
if(found) then append(lf, to_be_updated.json)
else append(lf, new.json)
// export not_avialbale.json
current <- initstream(current.json)
while(lc <- current.nextline())
found <- false;
feed <- initstream(feed.json)
while(lf <- feed.nextline())
if(JSON.parse(lf).id == JSON.parse(lc).id)
found <- true
break
if not(found) then append(lc, not_available.json)
对于O(nm)
和n = 10,000
,此代码的时间复杂度为m = 300,000
,O(1)
的空间复杂度为500mb
,因此core i5
需要很长时间使用{"id": "12340", "title": "A life journey", "price": "34.00"}
{"id": "12341", "title": "all over the world", "price": "42.00"}
{"id": "12342", "title": "good to remember", "price": "60.00"}
{"id": "12343", "title": "A night in Mars", "price": "14.00"}
...
我试图将逻辑仅放在一个嵌套循环中,但这是不可能的。我正在尝试使用未分类的文件来提高复杂度
你认为这是最好的方法吗?还有更好的方法吗?feed.json具有以下格式(示例)
var resolvedHub = container.Resolve<ITinyMessengerHub>();