为什么utorrents Magnet对Torrent文件的获取比我的python脚本更快?

时间:2015-09-24 05:30:08

标签: python libtorrent utorrent magnet-uri libtorrent-rasterbar

我正在尝试使用python脚本在UPDATE <table_name> SET `stat` = "Inactive" WHERE id <> 64 AND emp_id = 110 文件中转换torrent磁铁网址。 python脚本连接到.torrent并等待元数据,然后从中创建torrent文件。

e.g。

dht

以上脚本需要大约1分钟以上才能获取元数据并创建#!/usr/bin/env python ''' Created on Apr 19, 2012 @author: dan, Faless GNU GENERAL PUBLIC LICENSE - Version 3 This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. http://www.gnu.org/licenses/gpl-3.0.txt ''' import shutil import tempfile import os.path as pt import sys import libtorrent as lt from time import sleep def magnet2torrent(magnet, output_name=None): if output_name and \ not pt.isdir(output_name) and \ not pt.isdir(pt.dirname(pt.abspath(output_name))): print("Invalid output folder: " + pt.dirname(pt.abspath(output_name))) print("") sys.exit(0) tempdir = tempfile.mkdtemp() ses = lt.session() params = { 'save_path': tempdir, 'duplicate_is_error': True, 'storage_mode': lt.storage_mode_t(2), 'paused': False, 'auto_managed': True, 'duplicate_is_error': True } handle = lt.add_magnet_uri(ses, magnet, params) print("Downloading Metadata (this may take a while)") while (not handle.has_metadata()): try: sleep(1) except KeyboardInterrupt: print("Aborting...") ses.pause() print("Cleanup dir " + tempdir) shutil.rmtree(tempdir) sys.exit(0) ses.pause() print("Done") torinfo = handle.get_torrent_info() torfile = lt.create_torrent(torinfo) output = pt.abspath(torinfo.name() + ".torrent") if output_name: if pt.isdir(output_name): output = pt.abspath(pt.join( output_name, torinfo.name() + ".torrent")) elif pt.isdir(pt.dirname(pt.abspath(output_name))): output = pt.abspath(output_name) print("Saving torrent file here : " + output + " ...") torcontent = lt.bencode(torfile.generate()) f = open(output, "wb") f.write(lt.bencode(torfile.generate())) f.close() print("Saved! Cleaning up dir: " + tempdir) ses.remove_torrent(handle) shutil.rmtree(tempdir) return output def showHelp(): print("") print("USAGE: " + pt.basename(sys.argv[0]) + " MAGNET [OUTPUT]") print(" MAGNET\t- the magnet url") print(" OUTPUT\t- the output torrent file name") print("") def main(): if len(sys.argv) < 2: showHelp() sys.exit(0) magnet = sys.argv[1] output_name = None if len(sys.argv) >= 3: output_name = sys.argv[2] magnet2torrent(magnet, output_name) if __name__ == "__main__": main() 文件,而.torrent客户端只需几秒钟,为什么会这样?

如何让我的脚本更快?

我想获取大约1k +种子的元数据。

e.g。磁铁链接

utorrent

更新:

我在我的脚本中指定了这样的已知dht路由器URL。

magnet:?xt=urn:btih:BFEFB51F4670D682E98382ADF81014638A25105A&dn=openSUSE+13.2+DVD+x86_64.iso&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80

但它仍然很慢,有时会出现像

这样的错误
session = lt.session()
session.listen_on(6881, 6891)

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

更新:

我写过这个scmall脚本,它从DB中获取十六进制信息哈希并尝试从dht获取元数据,然后将torrent文件插入到数据库中。

我让它无限期地运行,因为我不知道如何保存状态,所以保持它运行将获得更多同行并且更快地获取元数据。

DHT error [hostname lookup] (1) Host not found (authoritative)
could not map port using UPnP: no router found

现在我需要按照Arvid的建议实施新的东西。

更新

我已经成功实施了Arvid的建议。我在deluge支持论坛http://forum.deluge-torrent.org/viewtopic.php?f=7&t=42299&start=10

中找到了更多扩展
#!/usr/bin/env python
# this file will run as client or daemon and fetch torrent meta data i.e. torrent files from magnet uri

import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory 
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity 
import os
from datetime import date, timedelta

#create lock file to make sure only single instance is running
lock_file_name = "/daemon.lock"

if(os.path.isfile(lock_file_name)):
    sys.exit('another instance running')
#else:
    #f = open(lock_file_name, "w")
    #f.close()

session = lt.session()
session.listen_on(6881, 6891)

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

alive = True
while alive:

    db_conn = MySQLdb.connect(  host = 'localhost',     user = '',  passwd = '',    db = 'basesite',    unix_socket='') # Open database connection
    #print('reconnecting')
    #get all records where enabled = 0 and uploaded within yesterday 
    subset_count = 5 ;

    yesterday = date.today() - timedelta(1)
    yesterday = yesterday.strftime('%Y-%m-%d %H:%M:%S')
    #print(yesterday)

    total_count_query = ("SELECT COUNT(*) as total_count FROM content WHERE upload_date > '"+ yesterday +"' AND enabled = '0' ")
    #print(total_count_query)
    try:
        total_count_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
        total_count_cursor.execute(total_count_query) # Execute the SQL command
        total_count_results = total_count_cursor.fetchone() # Fetch all the rows in a list of lists.
        total_count = total_count_results[0]
        print(total_count)
    except:
            print "Error: unable to select data"

    total_pages = total_count/subset_count
    #print(total_pages)

    current_page = 1
    while(current_page <= total_pages):
        from_count = (current_page * subset_count) - subset_count

        #print(current_page)
        #print(from_count)

        hashes = []

        get_mysql_data_query = ("SELECT hash FROM content WHERE upload_date > '" + yesterday +"' AND enabled = '0' ORDER BY record_num ASC LIMIT "+ str(from_count) +" , " + str(subset_count) +" ")
        #print(get_mysql_data_query)
        try:
            get_mysql_data_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
            get_mysql_data_cursor.execute(get_mysql_data_query) # Execute the SQL command
            get_mysql_data_results = get_mysql_data_cursor.fetchall() # Fetch all the rows in a list of lists.
            for row in get_mysql_data_results:
                hashes.append(row[0].upper())
        except:
            print "Error: unable to select data"

        print(hashes)

        handles = []

        for hash in hashes:
            tempdir = tempfile.mkdtemp()
            add_magnet_uri_params = {
                'save_path': tempdir,
                'duplicate_is_error': True,
                'storage_mode': lt.storage_mode_t(2),
                'paused': False,
                'auto_managed': True,
                'duplicate_is_error': True
            }
            magnet_uri = "magnet:?xt=urn:btih:" + hash.upper() + "&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80"
            #print(magnet_uri)
            handle = lt.add_magnet_uri(session, magnet_uri, add_magnet_uri_params)
            handles.append(handle) #push handle in handles list

        #print("handles length is :")
        #print(len(handles))

        while(len(handles) != 0):
            for h in handles:
                #print("inside handles for each loop")
                if h.has_metadata():
                    torinfo = h.get_torrent_info()
                    final_info_hash = str(torinfo.info_hash())
                    final_info_hash = final_info_hash.upper()
                    torfile = lt.create_torrent(torinfo)
                    torcontent = lt.bencode(torfile.generate())
                    tfile_size = len(torcontent)
                    try:
                        insert_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
                        insert_cursor.execute("""INSERT INTO dht_tfiles (hash, tdata) VALUES (%s, %s)""",  [final_info_hash , torcontent] )
                        db_conn.commit()
                        #print "data inserted in DB"
                    except MySQLdb.Error, e:
                        try:
                            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
                        except IndexError:
                            print "MySQL Error: %s" % str(e)    

                    shutil.rmtree(h.save_path())    #   remove temp data directory
                    session.remove_torrent(h) # remove torrnt handle from session   
                    handles.remove(h) #remove handle from list

                else:
                    if(h.status().active_time > 600):   # check if handle is more than 10 minutes old i.e. 600 seconds
                        #print('remove_torrent')
                        shutil.rmtree(h.save_path())    #   remove temp data directory
                        session.remove_torrent(h) # remove torrnt handle from session   
                        handles.remove(h) #remove handle from list
                sleep(1)        
                #print('sleep1')

        print('sleep10')
        sleep(10)
        current_page = current_page + 1
    #print('sleep20')
    sleep(20)

os.remove(lock_file_name);

让它保持运行一分钟并收到警报

#!/usr/bin/env python

import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory 
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity 
import os
from datetime import date, timedelta

def var_dump(obj):
  for attr in dir(obj):
    print "obj.%s = %s" % (attr, getattr(obj, attr))

session = lt.session()
session.add_extension('ut_pex')
session.add_extension('ut_metadata')
session.add_extension('smart_ban')
session.add_extension('metadata_transfer')  

#session = lt.session(lt.fingerprint("DE", 0, 1, 0, 0), flags=1)

session_save_filename = "/tmp/new.client.save_state"

if(os.path.isfile(session_save_filename)):

    fileread = open(session_save_filename, 'rb')
    session.load_state(lt.bdecode(fileread.read()))
    fileread.close()
    print('session loaded from file')
else:
    print('new session started')

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

alerts = [] 

alive = True
while alive:
    a = session.pop_alert()
    alerts.append(a)
    print('----------')
    for a in alerts:
        var_dump(a)
        alerts.remove(a)


    print('sleep10')
    sleep(10)
    filewrite = open(session_save_filename, "wb")
    filewrite.write(lt.bencode(session.save_state()))
    filewrite.close()

更新:

经过一些测试后看起来像

obj.msg = no router found 

造成

session.add_dht_router("router.bitcomet.com", 6881)

更新 我添加了

('%s: %s', 'alert', 'DHT error [hostname lookup] (1) Host not found (authoritative)')

并得到提醒

session.start_dht()
session.start_lsd()
session.start_upnp()
session.start_natpmp()

1 个答案:

答案 0 :(得分:3)

正如MatteoItalia所指出的那样,启动DHT并不是即时的,有时需要一段时间。自举过程完成时没有明确定义的时间点,它是越来越多地连接到网络的连续体。

您知道的节点越多,节点越稳定,越稳定,查找速度就越快。将大部分引导过程分解出来的一种方法(为了获得更多的苹果对苹果的比较)将是之后开始计时dht_bootstrap_alert(并且还推迟)直到那时添加磁铁链接。

添加dht引导节点将主要使可能引导,它仍然不一定特别快。您通常需要大约270个节点(包括替换节点)才能被视为自举。

加快引导过程的一件事是确保save and load会话状态,包括dht routing table。这会将上一个会话中的所有节点重新加载到路由表中(假设您没有更改IP并且一切正常),引导应该更快。

确保session constructor中启动DHT(作为flags参数,只需传入add_default_plugins),load the state,添加路由器节点然后start the dht

不幸的是,有很多活动部分需要让它在内部工作,顺序很重要,可能会有微妙的问题。

另外,请注意保持DHT持续运行会更快,因为重新加载状态仍然会通过引导程序,它将只有更多节点预先ping并尝试“连接”。

禁用start_default_features标志也意味着UPnP和NAT-PMP将无法启动,如果您使用它们,您还必须手动start