我有DataFrame
:
df = pd.DataFrame(['A','B','C'], columns = ['Letters'])
我有一个names
的列表:
names = ['George All', 'George Ball','George Ago','George Call']
如何在DataFrame
中创建一个新列,其中包含姓氏以Letters
列开头的名称列表。
例如:
Letters Names
A ['George All','George Ago']
B George Ball
C George Call
这就是我现在所拥有的:
df['Names'] = [name for name in names if (name.split()[1][0] == df['Letters'])]
答案 0 :(得分:1)
>>> df['Names'] = [[n for n in names if n.split()[1][0] == x] for x in df['Letters']]
>>> df
Letters Names
0 A [George All, George Ago]
1 B [George Ball]
2 C [George Call]
[3 rows x 2 columns]
您可以通过事先将所有名称按首字母分组来更有效地执行此操作(如果names
是大型列表,则很重要)。
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for item in names:
... d[item.split()[1][0]].append(item)
...
>>> df['Names'] = [d[x] for x in df['Letters']]
>>> df
Letters Names
0 A [George All, George Ago]
1 B [George Ball]
2 C [George Call]