我有一个文本文件,它是一个带标签的元组('neg','statements ......'),如下所示:
('neg', u'Without objection , approved .')
('posi', u'In intermeeting period , markets generally calm , investors perceived events generally positive outlook .')
('neg', u'Despite Committee \u2019 tightening policy August , yields fell across coupon curve , spreads narrowed , equities rose , asset markets continued exhibit unusually low volatility .')
('uncer', u'Despite Committee \u2019 tightening policy August , yields fell across coupon curve , spreads narrowed , equities rose , asset markets continued exhibit unusually low volatility .')
('neg', u'Despite Committee \u2019 tightening policy August , yields fell across coupon curve , spreads narrowed , equities rose , asset markets continued exhibit unusually low volatility .')
('uncer', u'Despite Committee \u2019 tightening policy August , yields fell across coupon curve , spreads narrowed , equities rose , asset markets continued exhibit unusually low volatility .')
('uncer', u'But forward rates falling gently time , apparently softer data more-tepid corporate outlooks caused market participants revise expected path monetary tightening .')
('uncer', u'But forward rates falling gently time , apparently softer data more-tepid corporate outlooks caused market participants revise expected path monetary tightening .')
('neg', u'But forward rates falling gently time , apparently softer data more-tepid corporate outlooks caused market participants revise expected path monetary tightening .')
('neg', u'The coupon curve also declined .')
我试图获得标签的长度,即元组列表中的'posi','neg'或'uncer'出现。
当我运行以下代码时,我得到了
negative= [(i, 'neg')for i in file_path if i not in i.split('neg')[:n_instances]]
print (len(negative))
positive=[(i, 'posi') for i in file_path if i not in i.split('posi')[:n_instances]]
print (len(positive))
uncertain=[(i, 'uncer') for i in file_path if i not in i.split('uncer')[:n_instances]]
print (len(negative), len(positive),len(uncertain))
**Output:**
576
0
0
(576, 0, 0)
**Expected:**
576
333
599
(576,333,599)
我不确定为什么在运行代码时我会得到零长度?经过一系列的调试,我找到了解决方案:
SOLUTION:
file_path=open('~/20021106.csv', 'r')
#20040921.20090318.csv #20021106.csv
positive=[]
negative=[]
uncertain=[]
for i in file_path:
if i not in i.split('posi'):
positive.append((i, 'posi'))
if i not in i.split('neg'):
negative.append((i, 'neg'))
if i not in i.split('uncer'):
uncertain.append((i, 'uncer'))
print len(negative), len(positive),len(uncertain)