希望将CSV文件索引到ElasticSearch,而不使用Logstash。
我正在使用dayfirst=True
高级库。
给定带标题的CSV,例如:
for chunk in pd.read_csv(file, chunksize=500000,
parse_dates=[['date', 'time']], # note the extra []
dayfirst=True,
names=col_names, index_col=index_cols,
header=0, dtype=dtype)
store.append('df',chunk)
按字段索引所有数据的最佳方法是什么?最终,我希望让每一行看起来像这样
elasticsearch-dsl
答案 0 :(得分:21)
对于较低级别的elasticsearch-py
库,此类任务更容易:
from elasticsearch import helpers, Elasticsearch
import csv
es = Elasticsearch()
with open('/tmp/x.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index='my-index', doc_type='my-type')
答案 1 :(得分:1)
如果您想使用严格的类型和模型从elasticsearch
创建.tsv/.csv
数据库以获得更好的过滤效果,您可以这样做:
class ElementIndex(DocType):
ROWNAME = Text()
ROWNAME = Text()
class Meta:
index = 'index_name'
def indexing(self):
obj = ElementIndex(
ROWNAME=str(self['NAME']),
ROWNAME=str(self['NAME'])
)
obj.save(index="index_name")
return obj.to_dict(include_meta=True)
def bulk_indexing(args):
# ElementIndex.init(index="index_name")
ElementIndex.init()
es = Elasticsearch()
//here your result dict with data from source
r = bulk(client=es, actions=(indexing(c) for c in result))
es.indices.refresh()