Question

我有一个函数应用于熊猫数据帧的每一行。它的主要用途是查询 REST API（Azure 的文本分析 API）并返回结果实体的列表。

def get_entity_rec(row):
    try:
        textcon = row._c0[0:5000]
        doc = [textcon]
        textconlang = row._c0[0:1000]
        doclang = [textconlang]
        # Get Language
        response = client.detect_language(documents = doclang, country_hint = 'us')[0]
        row['language'] = response.primary_language.name
        result = client.recognize_entities(documents = doc)[0]
        row['items'] = [[entity.text, entity.category, entity.subcategory, entity.confidence_score] for entity in result.entities]
        return row
    except Exception as err:
        print("Encountered exception. {}".format(err))

d = {'_c0': ['London', 'Paris'], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

pd_df2 = df.apply(get_entity_rec, axis=1)

pd_df2

我最初有一个这样的 for 循环：

     for entity in result.entities:
             b = [entity.text, entity.category, entity.subcategory, entity.confidence_score]
             a.append(b)
             row['items'] = a
     return row

但环顾四周似乎列表理解会表现得更好。但是，在进行更改后，我获得了几乎相同的运行时间（有时 for 循环会快一点）

如何提高 python 函数的 for 循环的性能？

0 个答案: