选择具有特定值的最高计数的列

时间:2020-06-30 17:20:40

标签: python pandas

我在返回具有最高计数值“ GPE”的列名时遇到问题。在这种情况下,我希望输出只是“文本”,因为该列具有两行“ GPE”,而text2列为1,text3列为0。

代码:

import spacy
import pandas as pd
import en_core_web_sm

nlp = en_core_web_sm.load()
text = [["Canada", 'University of California has great research', "non-location"],["China", 'MIT is at Boston', "non-location"]]
df = pd.DataFrame(text, columns = ['text', 'text2', 'text3'])

col_list = df.columns # obtains the columns of the dataframe

for col in col_list:
    df["".join(col)] = df[col].apply(lambda x: [[w.label_] for w in list(nlp(x).ents)]) # combine the ent_<<col_name>> as the new columns which contain the named entities.
df

所需的输出:

text

1 个答案:

答案 0 :(得分:1)

一旦从提供的脚本中准备好数据框JTable,就可以运行下面的3行以获取GPE实体出现次数最多的列

j.getColumnModel().getColumn(0).setPreferredWidth(5); //Set Width=5 for first column