我使用urllib3从 https://www.clres.com/db/parses/oec/abaft.parse 中获得了一个文件。它具有选项卡,然后是\ r \ n。在Python 2.7中,我使用的是StringIO,但这在Python 3.7中不可用。
由于StringIO已被淘汰,因此我尝试使用IO。
http = urllib3.PoolManager(timeout=10.0)
r = http.urlopen('GET', url, preload_content=False)
remote_file = r.data
memory_file = remote_file.decode('utf-8')
prep_sents = get_sentences(memory_file)
def get_sentence(memory_file):
sentence = []
for line in memory_file:
if not re.match(r'\s*\r?\n', line):
我希望得到一行,但是我只会得到一行中的第一个标记。
1\tWith\twith\t_\tIN\t_\t0\tROOT\t_\t_\t_\t_\t_\t_\r\n
答案 0 :(得分:1)
StringIO
在Python 3.7中可用
from io import StringIO
memory_file
是一个字符串,因此要获取每一行,您需要split
:
for line in memory_file.split('\n'):
print(line)
答案 1 :(得分:1)
在memory_file
中,您已经从服务器加载了数据。要拆分数据,请使用splitlines()
和split()
:
import urllib3
def get_sentences(memory_file):
sentences = []
for line in memory_file.splitlines():
if not line:
continue
sentences.append(line.split())
return sentences
url = 'https://www.clres.com/db/parses/oec/abaft.parse'
http = urllib3.PoolManager(timeout=10.0)
r = http.urlopen('GET', url, preload_content=False)
remote_file = r.data
memory_file = remote_file.decode('utf-8')
prep_sents = get_sentences(memory_file)
for line in prep_sents:
print(''.join('{: ^13}'.format(w) for w in line))
打印:
1 With with _ IN _ 0 ROOT _ _ _ _ _ _
2 this this _ DT _ 3 det _ _ _ _ _ _
3 security security _ NN _ 1 pcomp _ _ _ _ _ _
4 he he _ PRP _ 5 subj _ _ _ _ _ _
5 had have _ VBD _ 3 rcmod _ _ _ _ _ _
6 established establish _ VBN _ 5 vch _ _ _ _ _ _
7 as as _ IN _ 6 prep _ _ _ _ _ _
8 his his _ PRP$ _ 9 poss _ _ _ _ _ _
9 right right _ NN _ 7 pcomp _ _ _ _ _ _
10 a a _ DT _ 11 det _ _ _ _ _ _
11 caboose caboose _ NN _ 6 dobj _ _ _ _ _ _
12 abaft abaft _ IN _ 1 prep _ _ _ _ _ _
13 the the _ DT _ 14 det _ _ _ _ _ _
14 funnel funnel _ NN _ 12 pcomp _ _ _ _ _ _
15 in in _ IN _ 14 prep _ _ _ _ _ _
16 the the _ DT _ 17 det _ _ _ _ _ _
17 midships midships _ NNS _ 15 pcomp _ _ _ _ _ _
18 Bofors bofors _ NNP _ 19 nn _ _ _ _ _ _
19 gunshield gunshield _ NN _ 14 appos _ _ _ _ _ _
20 where where _ WRB _ 19 relmod _ _ _ _ _ _
21 the the _ DT _ 22 det _ _ _ _ _ _
22 gun gun _ NN _ 23 subj _ _ _ _ _ _
23 had have _ VBD _ 20 whcmp _ _ _ _ _ _
24 been be _ VBN _ 23 vch _ _ _ _ _ _
25 removed remove _ VBN _ 24 vch _ _ _ _ _ _
26 . . _ . _ 1 punct _ _ _ _ _ _
1 Dropping drop _ VBG _ 14 advcl _ _ _ _ _ _
2 down down _ RP _ 1 prt _ _ _ _ _ _
3 abaft abaft _ IN _ 1 prep _ _ _ _ _ _
4 the the _ DT _ 5 det _ _ _ _ _ _
5 bridge bridge _ NN _ 3 pcomp _ _ _ _ _ _
6 , , _ , _ 14 punct _ _ _ _ _ _
7 the the _ DT _ 9 det _ _ _ _ _ _
8 first first _ JJ _ 9 amod _ _ _ _ _ _
9 thing thing _ NN _ 14 subj _ _ _ _ _ _
10 to to _ TO _ 11 infmark _ _ _ _ _ _
11 come come _ VB _ 9 infmod _ _ _ _ _ _
12 into into _ IN _ 11 prep _ _ _ _ _ _
13 view view _ NN _ 12 pcomp _ _ _ _ _ _
14 was be _ VBD _ 0 ROOT _ _ _ _ _ _
15 the the _ DT _ 16 det _ _ _ _ _ _
16 funnel funnel _ NN _ 14 arg1 _ _ _ _ _ _
17 . . _ . _ 14 punct _ _ _ _ _ _
1 When when _ WRB _ 21 whadv _ _ _ _ _ _
2 a a _ DT _ 3 det _ _ _ _ _ _
3 mainsail mainsail _ NN _ 4 subj _ _ _ _ _ _
4 was be _ VBD _ 1 whcmp _ _ _ _ _ _
5 set set _ VBN _ 4 vch _ _ _ _ _ _
6 up up _ RP _ 5 prt _ _ _ _ _ _
7 in in _ IN _ 5 prep _ _ _ _ _ _
8 the the _ DT _ 10 det _ _ _ _ _ _
9 correct correct _ JJ _ 10 amod _ _ _ _ _ _
10 place place _ NN _ 7 pcomp _ _ _ _ _ _
11 abaft abaft _ IN _ 5 prep _ _ _ _ _ _
12 the the _ DT _ 13 det _ _ _ _ _ _
13 genoa genoa _ NN _ 11 pcomp _ _ _ _ _ _
14 , , _ , _ 21 punct _ _ _ _ _ _
15 the the _ DT _ 16 det _ _ _ _ _ _
16 strain strain _ NN _ 21 subj _ _ _ _ _ _
17 on on _ IN _ 16 prep _ _ _ _ _ _
18 the the _ DT _ 20 det _ _ _ _ _ _
19 headsail headsail _ NN _ 20 nn _ _ _ _ _ _
20 sheet sheet _ NN _ 17 pcomp _ _ _ _ _ _
21 was be _ VBD _ 0 ROOT _ _ _ _ _ _
22 observed observe _ VBN _ 21 vch _ _ _ _ _ _
23 to to _ TO _ 24 infmark _ _ _ _ _ _
24 rise rise _ VB _ 22 xcomp _ _ _ _ _ _
25 considerably considerably _ RB _ 24 advmod _ _ _ _ _ _
26 . . _ . _ 21 punct _ _ _ _ _ _
1 The the _ DT _ 2 det _ _ _ _ _ _
2 carpenter carpenter _ NN _ 3 subj _ _ _ _ _ _
3 had have _ VBD _ 0 ROOT _ _ _ _ _ _
4 turned turn _ VBN _ 3 vch _ _ _ _ _ _
5 the the _ DT _ 6 det _ _ _ _ _ _
6 capstan capstan _ NN _ 4 dobj _ _ _ _ _ _
7 just just _ RB _ 8 advmod _ _ _ _ _ _
8 abaft abaft _ IN _ 4 prep _ _ _ _ _ _
9 the the _ DT _ 10 det _ _ _ _ _ _
10 mainmast mainmast _ NN _ 8 pcomp _ _ _ _ _ _
11 into into _ IN _ 10 prep _ _ _ _ _ _
12 a a _ DT _ 15 det _ _ _ _ _ _
13 perfectly perfectly _ RB _ 14 advmod _ _ _ _ _ _
14 acceptable acceptable _ JJ _ 15 amod _ _ _ _ _ _
15 desk desk _ NN _ 11 pcomp _ _ _ _ _ _
16 . . _ . _ 3 punct _ _ _ _ _ _
1 The the _ DT _ 2 det _ _ _ _ _ _
2 first first _ JJ _ 11 subj _ _ _ _ _ _
3 of of _ IN _ 2 prep _ _ _ _ _ _
4 two two _ CD _ 5 num _ _ _ _ _ _
5 hatches hatch _ NNS _ 3 pcomp _ _ _ _ _ _
6 to to _ TO _ 5 prep _ _ _ _ _ _
7 the the _ DT _ 10 det _ _ _ _ _ _
8 control control _ NN _ 9 nn _ _ _ _ _ _
9 room room _ NN _ 10 nn _ _ _ _ _ _
10 section section _ NN _ 6 pcomp _ _ _ _ _ _
11 is be _ VBZ _ 0 ROOT _ _ _ _ _ _
12 immediately immediately _ RB _ 11 advmod _ _ _ _ _ _
13 abaft abaft _ IN _ 11 arg1 _ _ _ _ _ _
14 the the _ DT _ 15 det _ _ _ _ _ _
15 sail sail _ NN _ 13 pcomp _ _ _ _ _ _
16 , , _ , _ 11 punct _ _ _ _ _ _
17 being be _ VBG _ 11 advcl _ _ _ _ _ _
18 the the _ DT _ 20 det _ _ _ _ _ _
19 main main _ JJ _ 20 amod _ _ _ _ _ _
20 access access _ NN _ 17 arg1 _ _ _ _ _ _
21 into into _ IN _ 20 prep _ _ _ _ _ _
22 the the _ DT _ 23 det _ _ _ _ _ _
23 boat boat _ NN _ 21 pcomp _ _ _ _ _ _
24 . . _ . _ 11 punct _ _ _ _ _ _