通过多处理加快解析功能

时间:2018-06-23 12:34:08

标签: python parsing request python-requests multiprocessing

我正在尝试通过解析来获得您朋友的平均生日。但是大约需要一分钟。我试图加快多进程的处理速度,但出现了错误。我如何加快我的功能。还尝试了session.get,但是它没有输入功能的速度

def mean_friend_age(idi):
    mean_age_id = []
    mean_age = []
    age_number = []
    session = requests.Session()
    r_mean_id = requests.get('https://api.vk.com/method/friends.get? user_id='+ str(idi) + '&v=5.52&access_token=TOKEN')
    json_mean_id = r_mean_id.text
    string_mean_id = json_mean_id[34:-3]
    mean_ids_list = string_mean_id.split(',')
    for item in mean_ids_list:
        mean_age_id.append(item)
    for item in mean_age_id:
        req_mean = session.get('https://api.vk.com/method/users.getuser_id='+ str(item) + '&v=5.52&access_token=TOKEN&fields=counters,sex,bdate,country,hometown,lists,last_seen,verified,occupation,wall_comments,can_write_private_message, can_see_audio, can_see_all_posts, can_post')
        json_mean = req_mean.json()
        for item in json_mean['response']:
            json_dict = item
        while True:
            try:
                age = json_dict['bdate']
                break
            except KeyError:
                age = '0'
                break
        mean_age.append(age)
    for item in mean_age:
        if len(item) >= 7:
            dt_obj = datetime.datetime.strptime(item, '%d.%m.%Y')
            item_number = get_age(dt_obj)
            age_number.append(item_number)
    mean = statistics.mean(age_number)
    return mean

然后我尝试进行多处理并得到错误:

if __name__ == '__main__':
    with Pool(5) as p:
        a = p.map(mean_friend_age, '181145622')



Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-63-2984c547fa8d>", line 18, in mean_friend_age
age = json_dict['bdate']
UnboundLocalError: local variable 'json_dict' referenced before 
assignment

The above exception was the direct cause of the following exception:

UnboundLocalError                         Traceback (most recent call 
last)
<ipython-input-65-39dd60c6a8fb> in <module>()
  1 if __name__ == '__main__':
  2     with Pool(5) as p:
----> 3         a = p.map(mean_friend_age, '181145622')

/usr/lib/python3.6/multiprocessing/pool.py in map(self, func, 
iterable, chunksize)
264         in a list that is returned.
265         '''
--> 266         return self._map_async(func, iterable, mapstar, 
chunksize).get()
267 
268     def starmap(self, func, iterable, chunksize=None):

/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
642             return self._value
643         else:
--> 644             raise self._value
645 
646     def _set(self, i, obj):

UnboundLocalError: local variable 'json_dict' referenced before 
assignment

我如何加快功能?

1 个答案:

答案 0 :(得分:0)

代码中的问题很简单。

for循环中,您将item分配给json_dict变量:

for item in json_mean['response']:
    json_dict = item

然后您要在另一个json_dict循环中访问此while

while True:
    try:
        age = json_dict['bdate']

这里的问题是scoping。因此,几乎每种语言(JS中的var都不同)中的每个变量都有一个作用域。这意味着,如果您在代码块中创建代码(在C / C ++ / Java等语言中,代码在{}之间,而在Python中通过缩进代码来完成),则变量“有效”直到您将离开此代码块。

在您提供的代码中,不仅在离开for循环之后丢失了json_dict变量的范围(因此它对于编译器是未知的,或者在当前示例中是解释器),您无法访问它从您的while循环中获取,但另外,您还会在for循环的每次迭代中使用overwrite(我不知道您是否真的想这样做)。

解决方案是在循环上方创建json_dict(然后将其分配给None,然后从forwhile循环中访问它 >