我每小时有一次Elasticsearch备份,经过大约370次备份(大约15天)后,我的备份库超过了15G!但总指数大小只有500M! Elasticsearch是增量备份,但是15G VS 500M,差别是如此巨大!我想知道索引和备份库之间的大小不同是否正常?
是否是我频繁备份(每小时)造成的?我使用群集1中的每小时备份和群集2中的每小时恢复来保持两个ES群集数据实时相同。
我正在使用Elasticsearch备份API进行备份
设置回购: curl -XPUT" http://IP:9200/_snapshot/backup" -d" {\"输入\":\" fs \",\"设置\":{\"压缩\" :true,\" location \":\" backup \"}}"
备份:
CURTIME = date +"%Y-%m-%d %H:%M:%S"
BKTIME = $ {CURTIME // [ - :] /}
curl -XPUT" http://IP:9200/snapshot/backup/snapshot $ BKTIME?wait_for_completion = true"
我的Elasticsearch设置:2个节点,12个分片/节点,2个索引,用于将快照存储到NAS的fs备份类型
在Elasticsearch数据目录中,索引大小为:
node1索引大小: [root @ esnode1 indices] $ du -sh
307M。
node2索引大小 [root @ esnode2 indices] $ du -sh
238M。
[root @ esnode1 indices] $ du -lh
8.0K ./index1/10/translog 8.0K ./index1/10/_state 2.9M ./index1/10/index 2.9M ./index1/10 12K ./index1/5/translog 8.0K ./index1/5/_state 1.5M ./index1/5/index 1.5M ./index1/5 8.0K ./index1/4/translog 8.0K ./index1/4/_state 2.9M ./index1/4/index 2.9M ./index1/4 8.0K ./index1/_state 8.0K ./index1/7/translog 8.0K ./index1/7/_state 2.9M ./index1/7/index 2.9M ./index1/7 8.0K ./index1/1/translog 8.0K ./index1/1/_state 2.9M ./index1/1/index 2.9M ./index1/1 8.0K ./index1/2/translog 8.0K ./index1/2/_state 2.9M ./index1/2/index 2.9M ./index1/2 8.0K ./index1/6/translog 8.0K ./index1/6/_state 3.0M ./index1/6/index 3.0M ./index1/6 8.0K ./index1/0/translog 8.0K ./index1/0/_state 1.5M ./index1/0/index 1.5M ./index1/0 8.0K ./index1/8/translog 8.0K ./index1/8/_state 1.5M ./index1/8/index 1.5M ./index1/8 8.0K ./index1/11/translog 8.0K ./index1/11/_state 2.9M ./index1/11/index 2.9M ./index1/11 12K ./index1/9/translog 8.0K ./index1/9/_state 3.0M ./index1/9/index 3.0M ./index1/9 8.0K ./index1/3/translog 8.0K ./index1/3/_state 3.0M ./index1/3/index 3.0M ./index1/3 31M ./index1 16K ./index2/10/translog 8.0K ./index2/10/_state 16M ./index2/10/index 16M ./index2/10 36K ./index2/5/translog 8.0K ./index2/5/_state 43M ./index2/5/index 43M ./index2/5 20K ./index2/4/translog 8.0K ./index2/4/_state 17M ./index2/4/index 18M ./index2/4 8.0K ./index2/_state 40K ./index2/7/translog 8.0K ./index2/7/_state 32M ./index2/7/index 32M ./index2/7 68K ./index2/1/translog 8.0K ./index2/1/_state 21M ./index2/1/index 21M ./index2/1 64K ./index2/2/translog 8.0K ./index2/2/_state 19M ./index2/2/index 19M ./index2/2 116K ./index2/6/translog 8.0K ./index2/6/_state 22M ./index2/6/index 22M ./index2/6 24K ./index2/0/translog 8.0K ./index2/0/_state 17M ./index2/0/index 17M ./index2/0 128K ./index2/8/translog 8.0K ./index2/8/_state 34M ./index2/8/index 34M ./index2/8 72K ./index2/11/translog 8.0K ./index2/11/_state 20M ./index2/11/index 20M ./index2/11 88K ./index2/9/translog 8.0K ./index2/9/_state 22M ./index2/9/index 22M ./index2/9 76K ./index2/3/translog 8.0K ./index2/3/_state 16M ./index2/3/index 16M ./index2/3 277M ./index2 307M。
在备份库中,大小: [root @ esnode1 backup] $ du -lh
114M ./backup/indices/index1/0
112M ./backup/indices/index1/5
114M ./backup/indices/index1/11
114M ./backup/indices/index1/10
111M ./backup/indices/index1/8
116M ./backup/indices/index1/4
120M ./backup/indices/index1/9
118M ./backup/indices/index1/3
114M ./backup/indices/index1/2
115M ./backup/indices/index1/7
115M ./backup/indices/index1/1
112M ./backup/indices/index1/6
1.4G ./backup/indices/index1
747M ./backup/indices/index2/0
1.6G ./backup/indices/index2/5
887M ./backup/indices/index2/11
743M ./backup/indices/index2/10
2.1G ./backup/indices/index2/8
801M ./backup/indices/index2/4
1.3G ./backup/indices/index2/9
878M ./backup/indices/index2/3
951M ./backup/indices/index2/2
1.2G ./backup/indices/index2/7
953M ./backup/indices/index2/1
943M ./backup/indices/index2/6
13G ./backup/indices/index2
15G ./backup/indices
15G ./backup
1.1M ./backuplogs
15G。
====== https://www.elastic.co/blog/introducing-snapshot-restore 备份和还原操作都是增量操作,这意味着只有自上次快照以来更改的文件才会复制到存储库或还原到索引中。 增量快照允许根据需要频繁执行快照操作,而不会产生太多的磁盘空间开销。用户现在可以在升级之前轻松创建快照或在群集中进行风险更改,并快速回滚到先前的索引状态事情出错了。快照/恢复机制还可用于在“热”群集与不同地理区域中的远程“冷”备份群集之间同步数据,以实现快速灾难恢复。
从上面来看,我的案子确实是一个问题,任何人都可以帮助我吗?提前谢谢!
答案 0 :(得分:1)
在Elasticsearch官方论坛中确认
1)正常结果是索引和备份存储库的大小不同(500G VS 15G)
2)备份快照中的一些冗余数据是由Lucene
的段合并引起的来自Elasticsearch专家:如果您不断索引到群集,则段的合并将在后台连续发生,同一记录将随着时间的推移而在多个段中结束,从而导致存储库比索引大小大得多
https://discuss.elastic.co/t/backup-repository-size-is-much-bigger-than-indices-size/47469/7
答案 1 :(得分:0)
快照和还原在段级别工作,这是Elastic“Snapshot And Restore”文档中未提及的非常重要的信息。 我向Elastic报告了这一点,并询问他们是否可以更新文档。