检查 series 中的元素是否存在于 list of list 中的最简单快捷的方法是什么。例如,我有一个系列和一个列表列表如下?我有一个循环可以做到这一点,但它有点慢,所以我想要一种更快的方法来做到这一点。
groups = []
for desc in descs:
for i in range(len(list_of_list)):
if desc in list_of_list[i]:
groups.append(i)
list_of_list = [['rfnd sms chrgs'],
['loan payment receipt'],
['zen june2018 aksg sal 1231552',
'zen july2018 aksg sal 1411191',
'zen aug2018 aksg mda sal 16014'],
['cshw agnes john udo mrs ',
'cshw agnes john udo',
'cshw agnes udo',
'cshw agnes john'],
['sms alert charge outstanding'],
['maint fee recovery jul 2018', 'vat maint fee recovery jul 2018'],
['sept2018 aksg mda sal 20028',
'oct2018 aksg mda sal 21929',
'nov2018 aksg mda sal 25229'],
['sms alert charges 28th sep 26th oct 2018']]
descs =
1959 rfnd sms chrgs
1960 loan payment receipt
1961 zen june2018 aksg sal 1231552
1962 loan payment receipt
1963 cshw agnes john udo mrs
1964 maint fee frm 31 may 2018 28 jun 2018
1965 vat maint fee frm 31 may 2018 28 jun 2018
1966 sms alert charge outstanding
1967 loan payment receipt
1968 zen july2018 aksg sal 1411191
1969 loan payment receipt
预期输出就像一个数字列表
e.g [1,2,3,4,5,6]
答案 0 :(得分:1)
准备数据:
# merge a series without a name is not allowed
descs = descs.rename("descs")
# convert list of lists to a series
ll = pd.Series(list_of_list).explode().reset_index()
ll.columns = ["pos", "descs"]
>>> descs
1959 rfnd sms chrgs
1960 loan payment receipt
1961 zen june2018 aksg sal 1231552
1962 loan payment receipt
1963 cshw agnes john udo mrs
1964 maint fee frm 31 may 2018 28 jun 2018
1965 maint fee frm 31 may 2018 28 jun 2018
1966 sms alert charge outstanding
1967 loan payment receipt
1968 zen july2018 aksg sal 1411191
1969 loan payment receipt
Name: descs, dtype: object
>>> ll
pos descs
0 0 rfnd sms chrgs
1 1 loan payment receipt
2 2 zen june2018 aksg sal 1231552
3 2 zen july2018 aksg sal 1411191
4 2 zen aug2018 aksg mda sal 16014
5 3 cshw agnes john udo mrs
6 3 cshw agnes john udo
7 3 cshw agnes udo
8 3 cshw agnes john
9 4 sms alert charge outstanding
10 5 maint fee recovery jul 2018
11 5 vat maint fee recovery jul 2018
12 6 sept2018 aksg mda sal 20028
13 6 oct2018 aksg mda sal 21929
14 6 nov2018 aksg mda sal 25229
15 7 sms alert charges 28th sep 26th oct 2018
现在您可以合并 descs
和 ll
以获得您的号码列表:
df = pd.merge(descs, ll, on="descs", how="left").set_index(descs.index)
>>> df
descs pos
1959 rfnd sms chrgs 0.0
1960 loan payment receipt 1.0
1961 zen june2018 aksg sal 1231552 2.0
1962 loan payment receipt 1.0
1963 cshw agnes john udo mrs 3.0
1964 maint fee frm 31 may 2018 28 jun 2018 NaN
1965 maint fee frm 31 may 2018 28 jun 2018 NaN
1966 sms alert charge outstanding 4.0
1967 loan payment receipt 1.0
1968 zen july2018 aksg sal 1411191 2.0
1969 loan payment receipt 1.0
检查:
>>> df.loc[1966, "descs"]
'sms alert charge outstanding'
>>> list_of_list[int(df.loc[1966, "pos"])]
['sms alert charge outstanding']
另一种方法:
此方法利用了分类数据类型。可能会更快。
>>> ll = pd.Series(list_of_list).explode()
>>> descs.astype("category").map(pd.Series(ll.index, index=ll.astype("category")))
1959 0.0
1960 1.0
1961 2.0
1962 1.0
1963 3.0
1964 NaN
1965 NaN
1966 4.0
1967 1.0
1968 2.0
1969 1.0
dtype: float64