Question

我有一个这样构造的 df：

import pandas as pd

dic = {'001': [['one','two','three']],
       '002': [['two', 'five', 'eight']],
       '003': [['three','six','ten','twelve']]}
df = pd.DataFrame.from_dict(dic,orient='index')
df.reset_index(inplace=True)
df = df.rename(columns = {'index':'id',0:'values'})
print(df)

生成的 df 看起来像

    id                     values
0  001          [one, two, three]
1  002         [two, five, eight]
2  003  [three, six, ten, twelve]

如果调用相应列表中的特定值，我想编写一个函数，该函数返回一个数据帧或一系列 id。例如：

def find_ids(value):
    ids = psuedocode: if list contains value, then return id
    return ids

所以

find_ids('two')

应该返回

id
001
002

和

find_ids('twelve')

应该返回

id
003

Answer 1

尝试 .str.join(sep = " ").str.contains(value)，它首先将 list 转换为 string，然后检查生成的字符串是否包含 value。

def find_ids(df, value):
   return df.loc[df['values'].str.join(sep = " ").str.contains(value), "id"]

输出：

>>> print(find_ids(df, "two"))
0    001
1    002
Name: id, dtype: object

为了提高效率，请尝试在新列中使用 .str.join(sep = " ") 将列表保存为字符串，然后您可以使用 .str.contains(value) 进行搜索

df['values_str'] = df['values'].str.join(sep = " ")
def find_ids(df, value):
    return df.loc[df.values_str.str.contains(value), "id"]

输出：

>>> print(find_ids(df, "two"))
0    001
1    002
Name: id, dtype: object

Answer 2

你可以试试：

def find_ids(df, value):
    return df.loc[df["values"].apply(lambda x: value in x), "id"]


print(find_ids(df, "two"))

打印：

0    001
1    002
Name: id, dtype: object

Answer 3

您可以使用：

def find_ids(value):
    newdf=df.explode('values')
    return newdf.loc[newdf['values']==value,'id']

现在终于调用函数了：

print(find_ids('two'))

输出：

0    001
1    002

当另一列的列表包含特定值时返回 pandas df 列

3 个答案: