我正在进行文本处理,
假设我有一个文档并用它来与许多其他文档进行比较。
我使用txt
调用第一个文档,使用pat
调用其他文档。
这是我的主要程序
#read the document
txt = doc_gettext()
#read filename of other documents
filenames = doc.get_pat()
# iteration
d = int((len(txt) - 5 + 1) / k)
for i in range(1, len(filenames)):
# open pattern one by one through the loop by name
patname = filenames[i].replace('\n', '')
with open (patname, 'r') as pattern:
pattern = pattern.read().replace('\n', ' ').replace('\t', ' ')
pattern = pattern.split()
for j in range(k - 1):
p = Process(target=all_position, args=(int(j * d), int((j+1) * d) + 5 - 1, pattern, txt, i, R,))
processes.append(p)
p.start()
p = Process(target=all_position, args=(int(d * (k-1)), len(txt) + 5 - 1, pattern, txt, i, R,))
processes.append(p)
p.start()
for pr in processes:
pr.join()
我尝试在这里打印它们,因为我想稍后再做一些算法,
def all_position(x, y, pat, txt, i, R):
#print pat
print txt
#print R.put(pat)
if __name__ == '__main__':
main()
假设我使用txt
将token length = 20
保存在列表中,并希望在过程all_position
上打印它们,输出为:
['pe[[n''sppieelnn'ss, ii'llb''a, , k''abbraa'kk, aar'r'a', l, 'a'asal'la, as's'
r', a, 'm'rbrauamtmb'b, uu'ttt''a, , n''gttaaannn'gg, aa'nnm''a, , k''ammnaa'kk,
aa'nnl''e, , m''allreeimm'aa, rr'iil''a, , n''tllaaainn'tt, aa'iis''e, , n''dss
aeelnn'dd, aa'llk''a, , k''ik'ka, ak'kiki'u', k, 'u'k'ku, uk'kupu'i', n, 't'pupi
'i, nn'ttpuue''l, , a''nppgeeill'aa, nn'ggmiii''n, , u''mmm'ii, nn'uummme''j, ,
a'''mm, ee'jjbaau''k, , u'''bb, uu'kkbuua''j, , 'ub''ab, ja'ujc'ue, l'acneal'a,
n''a, p'', lc'aespltlaianksa't', i, 'k'k'pe, lr'atksaetsir'k]t
'a, s''k]e
rtas']
['pensil', 'bakar', 'alas', 'rambut['', p'etnasnigla'n, '', b'amkaakra'[n, '''p,
ae'lnlasesim'la, 'r', ir''ab, ma'bkluaatrn''t, , a''ita'al, na'gssa'en, n''d, r
a'almm'ab, ku'atkn'a', k, 'i't'la, en'mgkaaurnki'u', ', ', 'm'lapakinantnta'ui,
''', , l''epsmeealnradina'gl, i''', l, 'a'knmatikaniiu''m, , ''', ks'uemkneudj'a
a, l''', p, 'i'bnkutakuku'i', ', ', 'p'bekalujakunu'g', i, '''c, pe'ilmnaitnnuau
''m, , ''', pp'elmlaeasjntagi'ik, ''', , b''umkkieunr'ut, ma''sb, 'a']jm
ue'j, a''c, e'lbaunkau'', , ''bapjlua's, t'icke'l, a'nkae'r, t'apsl'a]s
tik', 'kertas']
为什么会发生这样的事情?这让我非常困惑。 有人可以帮我解决这个问题吗?
答案 0 :(得分:2)
如果您需要safe printing,则可以使用Lock个对象。
让我们看看一些代码......
from multiprocessing import Lock, Process
import sys
# NOT SAFE
def not_safe_print(x):
for i in range(10):
# problem!
print range(20)
# pool of 10 workers
processes = []
for i in range(10):
processes.append(Process(target=not_safe_print, args=(i,)))
for p in processes:
p.start()
for p in processes:
p.join()
正如我们所看到的,两个进程可以同时在print
语句中。这不是安全"。
假设我们有两个进程(编号为1和2),每次scheduler给它们一些时间运行时就会运行一条指令。这些流程最终只会写入他们打算写入stdout
的列表的部分。然后,系统将刷新stdout缓冲区并显示损坏的输出。
希望当你运行这个脚本时(你可能需要运行几次) - 你会看到程序中的错误文本。
为了使脚本安全,我们必须限制对stdout
缓冲区等共享资源的访问(你最终在终端上看到的内容 - 也可能是一个文件)。这也称为mutual exclusion。为此,我们可以使用Lock
对象来提供解决互斥问题的方法。
# used to implement a SAFE print
lock = Lock()
def safe_print(x):
# when a process reaches this point it acquires the lock.
# none goes in without the lock - only a single process can pass
lock.acquire()
for i in range(10):
print range(20)
# when the process is done it releases the lock for other processes to grab
# meaning another process can now use stdout (used by print...)
lock.release()
别忘了改变这一行:
processes.append(Process(target=safe_print, args=(i,)))