使用存储在字典中的字符串调用类

时间:2017-08-28 19:20:38

标签: python python-3.x scrapy

我带回了十几种Scrapy项目类型,清理并将它们全部存储在SQL中的各个表中。我可以为每个项目编写说明,但似乎更整洁地以编程方式管理各种列表/数据框/表创建。

  • 不幸的是,当我尝试通过引用dict中的条目来调用scrapy项时,Python将其读作字符串而不是类型或类。
  • 同样,当我尝试引用列表名称时 - Python仍然看到一个字符串不允许我使用.append()。

任何有关让Python读取字符串作为类引用或列表引用的帮助都将非常感激。

以下是我的代码版本:

from scrapy import signals
from dealinfo.items import List, Details, Rd, Status, CompletedDetails, Syndicate
from dealinfo.items import CompanyDetails, CompanyContactInfo, CompanyTeam, Compadvisors, CompanyPInvestors
from dealinfo.items import CompanyExecSum, CurrentRd, PastRd, AnnFin#, CompanyDocs

import pandas as pd
from sqlalchemy import create_engine

class SQLPipeline(object):
    engine=create_engine('mssql+pyodbc://username:password@database')

    #### matrix of table names by type ######
    prep = {'item_names': ['List', 'Details', 'Rd', 'Fin', 'Status', 'CompletedDetails', 'Syndicate', 'CompanyDetails', 'CompanyContactInfo', 'CompanyTeam', 'Compadvisors', 'CompanyPInvestors', 'CompanyExecSum', 'CompanyCurrentRd', 'CompanyPastRd', 'CompannFin', 'CompanyDocs'],
            'temp_table': ['items_dl', 'items_dd', 'items_dr', 'items_df', 'items_nds', 'items_ncd', 'items_ns', 'items_cd', 'items_cci', 'items_ct', 'items_ca', 'items_cpi', 'items_es', 'items_cr', 'items_pr', 'items_af', 'items_cdoc'],            
            'data_frame': ['dl', 'dd', 'dr', 'df', 'nds', 'ncd', 'ns', 'cd', 'cci', 'ct', 'ca', 'cpi', 'es', 'cr', 'pr', 'af', 'cdoc'],
            'sql_table': ['list', 'details', 'rd', 'fin', 'status', 'completed_details', 'syndicate', 'company_details', 'company_contact_info', 'company_team', 'company_advisors', 'company_pinvestors', 'company_execsum', 'company_current_rd', 'company_past_rd', 'company_ann_fin', 'company_docs']
            }

    #### assigning temporary lists for capturing parsed items ######
    for x in prep['temp_table']:
        globals()[x] = []

    #### create sql schema to receive final output ######
    def __init__(self):
        try: ## Check schema exists, create if not
            SQLPipeline.engine.execute("create schema dealinfo")
        except:
            pass   

    #### clean each scrapy item and add contents to temporary list (ahead of conversion to dataframe) ######
    def process_item(self, item, spider):
        for i in range(len(SQLPipeline.prep['item_names'])):
            if isinstance(item, SQLPipeline.prep['item_names'][i]):####<<---error - not able to call item using string
                for key,value in item.items(): 
                    if isinstance(item[key], list):
                        item[key] = [x.strip() for x in item[key] if x]
                        item[key] = [x for x in item[key] if x]
                        item[key] = ', '.join(item[key])
                SQLPipeline.prep['temp_table'][i].append(item.copy())####<<---error - not able to call item using string

    #### convert parsed items to pandas dataframe before sending to sql as tables ######     
    def close_spider(self, spider):
        for i in SQLPipeline.prep['item_names']:
            try:
                SQLPipeline.prep['data_frame'][i] = pd.DataFrame(SQLPipeline.prep['temp_table'][i])
                print(SQLPipeline.prep['data_frame'][i])
                SQLPipeline.prep['data_frame'][i].to_sql(SQLPipeline.prep['sql_table'][i], SQLPipeline.engine, schema='dealinfo', if_exists='replace', index=False)
            except Exception as ex:
                print(ex)
                pass

3 个答案:

答案 0 :(得分:1)

我认为eval可能会对您有所帮助。

>>> class MyClass():
...     pass
... 
>>> myinstance = Myclass()

>>> type(myinstance)
<class '__main__.Myclass'>

>>> type('myinstance')
<class 'str'>

>>> type(eval('myinstance'))
<class '__main__.Myclass'>

答案 1 :(得分:0)

我相信你要找的是https://jsfiddle.net/qrwvvtxs/

答案 2 :(得分:0)

juanpa.arrivillaga对我的情况有正确的答案 - 解决了所有问题 - 现在一切都在运行。我在列表中声明了一个列表列表,然后将列表和项目添加到我的字典中。不需要任何其他东西!

    for x in ['items_dl','items_dd','items_dr','items_df','items_nds','items_ncd','items_ns','items_cd','items_cci','items_ct','items_ca','items_cpi','items_es','items_cr','items_pr','items_af','items_cdoc']:
    globals()[x] = []
    prep = {'temp_table': [items_dl, items_dd, items_dr, items_df, items_nds, items_ncd, items_ns, items_cd, items_cci, items_ct, items_ca, items_cpi, items_es, items_cr, items_pr, items_af, items_cdoc],
        'item_names': [List, Details, Rd, Status, CompletedDetails, Syndicate, CompanyDetails, CompanyContactInfo, CompanyTeam, Compadvisors, CompanyPInvestors, CompanyExecSum, CompanyCurrentRd, CompanyPastRd, CompannFin, CompanyDocs],
        'data_frame': ['dl', 'dd', 'dr', 'df', 'nds', 'ncd', 'ns', 'cd', 'cci', 'ct', 'ca', 'cpi', 'es', 'cr', 'pr', 'af', 'cdoc'],
        'sql_table': ['list', 'details', 'rd', 'fin', 'status', 'completed_details', 'syndicate', 'company_details', 'company_contact_info', 'company_team', 'company_advisors', 'company_pinvestors', 'company_execsum', 'company_current_rd', 'company_past_rd', 'company_ann_fin', 'company_docs']
        }