我是spaCy的新手。我添加了这篇文章作为文档,让我的新手很简单。
use Prod_data
declare @ReportingStart datetime = dateadd(HH,-17,convert(datetime,convert(date,getdate())))
declare @ReportingEnd datetime = dateadd(HH,7,convert(datetime,convert(date,getdate())))
-- Daily Production time
declare @Production float = (select sum(dDurationSeconds/60)
from OEEQStateData
where tstart >= @ReportingStart and tstart < @ReportingEnd
and sStateDescription = 'Production'and sWorkcellDescription ='Hoisting')
-- Daily Idle time
declare @Idle float = (select isnull(sum(dDurationSeconds/60),0)
from OEEQStateData
where tstart >= @ReportingStart and tstart < @ReportingEnd
and sStateDescription = 'Idle Time'and sWorkcellDescription ='Hoisting')
-- Daily Unplanned time
declare @Unplanned float = (select sum(dDurationSeconds/60)
from OEEQStateData
where tstart >= @ReportingStart and tstart < @ReportingEnd
and sStateDescription like 'Unplanned%'and sWorkcellDescription ='Hoisting')
--Daily Maintenance time
declare @Planned float = (select sum(dDurationSeconds/60)
from OEEQStateData
where tstart >= @ReportingStart and tstart < @ReportingEnd
and sStateDescription like 'Planned%'and sWorkcellDescription ='Hoisting')
--Util
declare @Util float = @Production/(1440-@Planned-@Unplanned)
--Avail
declare @Avail float = ((@Production+@Idle)/1440)
--Hoist Schedule
declare @HoistSched int = (select round(DS_Prod+NS_Prod,-2)
from Schedule
where date = convert(date,@ReportingStart))
--Hoist Schedule for tomorrow
declare @HoistSchedTom int = (select round(DS_Prod+NS_Prod,-2)
from Schedule
where date = convert(date,@ReportingEnd))
--PM for tommorrow
declare @PM int = (select (DS_DT+NS_DT)
from Schedule
where date = convert(date,dateadd(dd,1,getdate())))
--Hoist Daily Production
declare @Tonnes int = (select top 1
case
when coalesce(lead(value) over(partition by tagname order by datetime),0) - value < '0' then ''
else coalesce(lead(value) over(partition by tagname order by datetime),0) - value
end
from Linked_Database
where datetime between @ReportingStart and @ReportingEnd
and wwResolution = (1440 * 60000)
and tagname = 'SALV_CV005_WX1_PROD_DATA.Actual_Input'
)
--MPS 24HR
declare @MPS_today float = (select sum(value)
from Linked_Database
where datetime = @ReportingEnd
and tagname like 'MPS_FI7940%.Actual_Input')
declare @MPS_yest float = ( select sum(value)
from Linked_Database
where datetime = @ReportingStart
and tagname like 'MPS_FI7940%.Actual_Input')
declare @MPS_total float = (@MPS_today-@MPS_yest)
--IPDW 24HR (claypit + IPDW)
declare @IPDW_today float = (select isnull(sum(value),0)
from Linked_Database
where datetime = @ReportingEnd
and tagname like '%FI792%.Actual_Input')
declare @Clay_today float = (select isnull(sum(value),0)
from Linked_Database
where datetime = @ReportingEnd
and tagname like '%FI764%_TOTAL.PVAI')
declare @IPDW_yest float = (select isnull(sum(value),0)
from Linked_Database
where datetime = @ReportingStart
and tagname like '%FI792%.Actual_Input')
declare @Clay_yest float = (select isnull(sum(value),0)
from Linked_Database
where datetime = @ReportingStart
and tagname like '%FI764%_TOTAL.PVAI')
declare @IPDW_total float = (@IPDW_today+@Clay_today-@IPDW_yest-@Clay_yest)
--Average airflow across both vent fan
declare @VF_Avg float = (select avg(value)
from Linked_Database
where datetime between @ReportingStart and @ReportingEnd
and tagname = 'vfans_totalairflow.pv_at')
--BAC wet bulb
declare @BAC_Wet float = (select avg(value)
from Linked_Database
where datetime between @ReportingStart and @ReportingEnd
and tagname = 'gb_bac_tt787125a._analog_PV')
declare @BAC_Dry float = (select avg(value)
from Linked_Database
where datetime between @ReportingStart and @ReportingEnd
and tagname = 'gb_bac_tt787125b._analog_PV')
--Final Select Statement
select @HoistSched as Hoist_Sched_today, @HoistSchedTom as Hoist_Sched_Tom, @PM as PM_Tom, @Tonnes as Hoist_Act, @Util as Hoist_Util, @Avail as Hoist_Avail, @MPS_total as MPS_Dewatering_Total, @IPDW_total as IPDW_Dewatering_Total, @VF_Avg as VFan_AVG, @BAC_Dry as BAC_Dry_AVG, @BAC_Wet as BAC_Wet_AVG
我希望了解orth,lemma,tag和pos的含义是什么?此代码还会打印出值import spacy
nlp = spacy.load('en')
doc = nlp(u'KEEP CALM because TOGETHER We Rock !')
for word in doc:
print(word.text, word.lemma, word.lemma_, word.tag, word.tag_, word.pos, word.pos_)
print(word.orth_)
与print(word)
答案 0 :(得分:13)
orth,lemma,tag和pos的含义是什么?
请参阅https://spacy.io/docs/usage/pos-tagging#pos-schemes
print(word)vs print(word.orth _)
之间有什么不同
超短:
word.orth_
和word.text
是相同的。事实上,cython属性以下划线结尾,它通常是开发人员并不真正希望向用户公开的变量。
简而言之:
当您访问https://github.com/explosion/spaCy/blob/develop/spacy/tokens/token.pyx#L537处的word.orth_
属性时,它会尝试访问保留所有词汇词汇的索引:
property orth_:
def __get__(self):
return self.vocab.strings[self.c.lex.orth]
(有关详情,请参阅下面的 In long
,了解self.c.lex.orth
)
并且word.text
返回仅包含orth_
属性的单词的字符串表示形式,请参阅https://github.com/explosion/spaCy/blob/develop/spacy/tokens/token.pyx#L128
property text:
def __get__(self):
return self.orth_
当您重新打印print(word)
时,它会调用__repr__
dunder函数,该函数返回word.__unicode__
或word.__byte__
,返回word.text
变量,请参阅https://github.com/explosion/spaCy/blob/develop/spacy/tokens/token.pyx#L55
cdef class Token:
"""
An individual token --- i.e. a word, punctuation symbol, whitespace, etc.
"""
def __cinit__(self, Vocab vocab, Doc doc, int offset):
self.vocab = vocab
self.doc = doc
self.c = &self.doc.c[offset]
self.i = offset
def __hash__(self):
return hash((self.doc, self.i))
def __len__(self):
"""
Number of unicode characters in token.text.
"""
return self.c.lex.length
def __unicode__(self):
return self.text
def __bytes__(self):
return self.text.encode('utf8')
def __str__(self):
if is_config(python3=True):
return self.__unicode__()
return self.__bytes__()
def __repr__(self):
return self.__str__()
长期:
让我们一步一步地完成这个步骤:
>>> import spacy
>>> nlp = spacy.load('en')
>>> doc = nlp(u'This is a foo bar sentence.')
>>> type(doc)
<type 'spacy.tokens.doc.Doc'>
将句子传递到nlp()
函数后,它会从文档中生成spacy.tokens.doc.Doc
个对象:
cdef class Doc:
"""
A sequence of `Token` objects. Access sentences and named entities,
export annotations to numpy arrays, losslessly serialize to compressed
binary strings.
Aside: Internals
The `Doc` object holds an array of `TokenC` structs.
The Python-level `Token` and `Span` objects are views of this
array, i.e. they don't own the data themselves.
Code: Construction 1
doc = nlp.tokenizer(u'Some text')
Code: Construction 2
doc = Doc(nlp.vocab, orths_and_spaces=[(u'Some', True), (u'text', True)])
"""
因此spacy.tokens.doc.Doc
对象是spacy.tokens.token.Token
对象的序列。在Token
对象中,我们看到列举了一系列cython property
,例如在https://github.com/explosion/spaCy/blob/develop/spacy/tokens/token.pyx#L162
property orth:
def __get__(self):
return self.c.lex.orth
追溯,我们看到self.c = &self.doc.c[offset]
:
cdef class Token:
"""
An individual token --- i.e. a word, punctuation symbol, whitespace, etc.
"""
def __cinit__(self, Vocab vocab, Doc doc, int offset):
self.vocab = vocab
self.doc = doc
self.c = &self.doc.c[offset]
self.i = offset
如果没有完整的文档,我们真的不知道self.c
的含义,但从它的外观来看,它正在访问指向&self.doc
引用中的一个令牌传递到Doc doc
函数的{1}}。所以最有可能的是,它是访问令牌的捷径
查看__cinit__
:
Doc.c
现在我们看到cdef class Doc:
def __init__(self, Vocab vocab, words=None, spaces=None, orths_and_spaces=None):
self.vocab = vocab
size = 20
self.mem = Pool()
# Guarantee self.lex[i-x], for any i >= 0 and x < padding is in bounds
# However, we need to remember the true starting places, so that we can
# realloc.
data_start = <TokenC*>self.mem.alloc(size + (PADDING*2), sizeof(TokenC))
cdef int i
for i in range(size + (PADDING*2)):
data_start[i].lex = &EMPTY_LEXEME
data_start[i].l_edge = i
data_start[i].r_edge = i
self.c = data_start + PADDING
指的是一个cython指针数组Doc.c
,它分配内存来存储data_start
对象(如果我得到解释,请纠正我{{ 1}}错误)。
回到spacy.tokens.doc.Doc
,它基本上试图访问存储数组的存储点,更具体地说是访问&#34; offset-th&#34;数组中的项目。
<TokenC*>
是什么。
回到self.c = &self.doc.c[offset]
:
spacy.tokens.token.Token
我们看到property
正在访问data_start[i].lex
from spacy.tokens.doc.Doc
而property orth:
def __get__(self):
return self.c.lex.orth
只是一个整数,表示self.c.lex
中保留的单词出现的索引内部词汇。
因此,我们看到self.c.lex.orth
尝试使用spacy.tokens.doc.Doc
https://github.com/explosion/spaCy/blob/develop/spacy/tokens/token.pyx#L162
property orth_
self.vocab.strings
答案 1 :(得分:1)