如果其键出现在另一个列表中并将字符串添加到一起,请从列表中选择各种条目

时间:2017-12-12 14:29:51

标签: python list pandas

我对我的数据框有疑问。在一栏中,对于每一行,我都有一份相关人员名单(人员名单)和一份人员名单。演讲(演讲)(有关和无关的人士和演讲)。现在,我想选择相关人员的演讲(来自人名单),其中相关信息是否在另一栏中的列表(人员名单)中给出,随后将所有演讲加在一起,同时忽略不相关的演讲。因此,一列提供了我正在寻找的姓氏列表,另一列提供了所有发言人(姓名和姓名)及其演讲的列表,我想创建一个新列,其中添加了相关人员的演讲(由空格分隔)并存储在相应的行中。

所以我的初始数据集如下所示:

ticker  year    quarter personlist              jobposition speech
xx      2009    1       ("Angle", "Barth")      CEO         [("Mike Angle", "Thank you"), ("Barbara Barth", "It is"), ("Will Cook", "Yes, true")]
xx      2009    1       ("Angle", "Barth")      CFO         [("Mike Angle", "Thank you"), ("Barbara Barth", "It is"), ("Will Cook", "Yes, true")]
xx      2009    2       ("Angle", "Barth")      CEO         [("Mike Angle", "I am surprised"), ("Barbara Barth", "So am I"), ("Will Cook", "Me too")]
xx      2009    2       ("Angle", "Barth")      CFO         [("Mike Angle", "I am surprised"), ("Barbara Barth", "So am I"), ("Will Cook", "Me too")]
yy      2008    3       ("Cruz", "Dolm")        CEO         [("Damien Cruz", "Hello"), ("Lara Dolm", "Nice to meet you"), ("Lara Bel", "You too")]
yy      2008    3       ("Cruz", "Dolm")        CFO         [("Damien Cruz", "Hello"), ("Lara Dolm", "Nice to meet you"), ("Lara Bel", "You too")]

例如,对于第一行,我想检查每个键值对是否第一个列表条目以人员列表中的一个姓氏结束,如果没有继续,如果是,则取出语音部分(即值为条目)并将其存储在新列中,为其他列重复并将匹配项添加到一起。因此,我想要以下数据集(我在这里隐藏了初始列语句,但它仍然应该被包含,所以我不想替换它,只需创建一个新列。)

ticker  year    quarter personlist               relevantspeeches
xx      2009    1       ("Angle", "Barth")       "Thank you It is"
xx      2009    1       ("Angle", "Barth")       "Thank you It is"
xx      2009    2       ("Angle", "Barth")       "I am surprised So am I"
xx      2009    2       ("Angle", "Barth")       "I am surprised So am I"
yy      2008    3       ("Cruz", "Dolm")         "Hello Nice to meet you"
yy      2008    3       ("Cruz", "Dolm")         "Hello Nice to meet you"

有人可以帮我解决这个问题吗?

谢谢!!朱莉娅

2 个答案:

答案 0 :(得分:0)

定义一个执行脏工作的函数。

replaceAll("\\n","\n")

我们的想法是构建一个art_expand.setText(na_expand.get(position).replaceAll("\\n","\n")); 映射字典,以便快速查找和连接名称。

现在,在功能准备就绪后,请致电def foo(row): sp_dict = dict( (x.split()[-1], y) for x, y in row['speech'] ) return ' '.join( [sp_dict.get(p, '') for p in row['personlist']] ) -

lastname : speech

答案 1 :(得分:0)

带有理解列表和应用方法:

def select(row):
    return " ".join([said for person in row.personlist
    for name,said in row.speech if person in name])

df['relevant'] = df.apply(select,axis=1) 
然后

df.relevant

"""
0           Thank you It is
1           Thank you It is
2    I am surprised So am I
3    I am surprised So am I
4    Hello Nice to meet you
5    Hello Nice to meet you
Name: relevant, dtype: object
"""