我使用IFilter索引某些MS Office文档。 从文件加载是可以的,一切都很好,就像在所有手册和样本中一样:
HRESULT hr_f = LoadIFilter(filename, 0, (void **)&pFilter);
但是,使用BindIFilterFromStream API失败了,我无法弄清楚如何正确使用它。
HRESULT hr_ss = BindIFilterFromStream(spStream/*my IStream* impl*/, 0, (void **)&pFilter);
我实现了IStream
接口,只有在初始化期间调用的方法(IUnknown
除外)是:
HRESULT StreamFilter::Stat(STATSTG * pstatstg, DWORD grfStatFlag)
{
//Microsoft Office Ifilter from Windows Registry
const IID CLSID_IFilter = {
0xf07f3920,
0x7b8c,
0x11cf,
{ 0x9b, 0xe8, 0x00, 0xaa, 0x00, 0x4b, 0x99, 0x86 }
//{f07f3920-7b8c-11cf-9be8-00aa004b9986}
};
LARGE_INTEGER pSize;
int fl = GetFileSizeEx(_hFile, &pSize);
memset(pstatstg, 0, sizeof(STATSTG));
pstatstg->clsid = CLSID_IFilter;
pstatstg->type = STGTY_STREAM;
pstatstg->cbSize.QuadPart = pSize.QuadPart;
return S_OK;
}
之后hr_ss
为E_FAIL
且IFilter
为NULL
。
有Using IFilter in C#个案例,这些方法也适用于c ++中的* .pdf,但不适用于MSO文档......
答案 0 :(得分:0)
我想出了如何正确初始化IFilter
,这是代码:
HRESULT hr = LoadIFilter(L".doc", 0, (void **)&pFilter);
IPersistStream *stream;
HRESULT hr_qi = pFilter->QueryInterface(&stream);
std::ifstream ifs(filename, ios::binary);
std::string content((std::istreambuf_iterator<char>(ifs)),
(std::istreambuf_iterator<char>()));
IStream *comStream;
HGLOBAL hMem = ::GlobalAlloc(GMEM_MOVEABLE, content.size());
LPVOID pDoc = ::GlobalLock(hMem);
memcpy(pDoc, content.c_str(), content.size());
::GlobalUnlock(hMem);
HRESULT hr_mem = ::CreateStreamOnHGlobal(hMem, true, &comStream);
HRESULT hr_stream_load = stream->Load(comStream);
从文档中获取文本就像来自MSDN的常规样本
if (SUCCEEDED(hr))
{
DWORD flags = 0;
HRESULT hr = pFilter->Init(IFILTER_INIT_INDEXING_ONLY |
IFILTER_INIT_APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES |
IFILTER_INIT_FILTER_OWNED_VALUE_OK |
IFILTER_INIT_APPLY_OTHER_ATTRIBUTES,
0, 0, &flags);
if (FAILED(hr))
{
pFilter->Release();
throw exception("IFilter::Init() failed");
}
Start();
STAT_CHUNK stat;
while (SUCCEEDED(hr = pFilter->GetChunk(&stat)))
{
if ((stat.flags & CHUNK_TEXT) != 0)
ProcessTextChunk(pFilter, stat);
if ((stat.flags & CHUNK_VALUE) != 0)
ProcessValueChunk(pFilter, stat);
}
Finish();
pFilter->Release();
}
else
{
throw exception("LoadIFilter() failed");
}
您不需要实现自己的IStream,只需从缓冲区初始化它......