GroupID列具有两个值,一个为字母数字,另一个为数字,我想使用该列的字母数字部分来创建新列,并带有某种条件,例如,如果o_dict中存在字母数字,则它应返回值,否则应返回“ NOT IN DIC”
GroupID
0 ad32s;#1214;#rf343;#4343
1 wd435;#6464;#ed532;#5454
2 av345e:#3132
3 ok132d;#8897
4 tn123h;#0980
5 as845;#657;#un567t;#456;#qw147;#123
6 ok132d;#8897
7 as845;#657;#un567t;#456;#qw147;#123
8 wd435;#6464;#ed532;#5454
o_dict= {"ad32s":"rupesh","ed532":"Frank","dr501u":"David","ok132d":"Ripal",
"qw147":"ilesh","av345e":'carls'}
下面是我的代码:
def function01(row):
o_dict= {"ad32s":"rupesh","ed532":"Frank","dr501u":"David","ok132d":"Ripal","qw147":"ilesh","av345e":'carls'}
if element.isalnum():
if element in o_dict:
return owner_dict[element]
else:
return "NOT IN DIC"
else:
continue
df['New_column'] = df.apply(lambda x: function01(x), axis=1)
如果字母数字值在第一个位置,则此代码有效,但在3或5位置时,此代码不起作用。它适用于行0,2,3,4 6,但不适用于1,5,7,8。
O / p应该有两列,其dict中的值与groupid相匹配,否则应填充“ NOT IN DIC”。
我不确定我现在能做什么,是否有另一种方法来获取此值? 是否有任何搜索功能可用于搜索此值?
感谢您的帮助:)
答案 0 :(得分:0)
我发现在我的代码中,for循环仅适用于列表中的第一个值,并且填充“ NOT IN DIC”而不检查其他值。我现在进行了以下更改,并获得了预期的输出。
def function01(row):
o_dict= {"ad32s":"rupesh","ed532":"Frank","dr501u":"David","ok132d":"Ripal","qw147":"ilesh","av345e":'carls'}
listA = row['Assigned'].split(";#")
listB = [i for i in listA if i.isdigit()==False]
for element in listA:
if element in owner_dict:
return owner_dict[element]
else:
continue
return "NOT IN DIC"
df['New_column'] = df.apply(lambda x: function01(x), axis=1)
答案 1 :(得分:0)
您可能想使用numpy.select
import numpy
import pandas
d = {
"GroupID": [
"ad32s;#1214;#rf343;#4343",
"wd435;#6464;#ed532;#5454",
"av345e:#3132",
"ok132d;#8897",
"tn123h;#0980",
"as845;#657;#un567t;#456;#qw147;#123",
"ok132d;#8897",
"as845;#657;#un567t;#456;#qw147;#123",
"wd435;#6464;#ed532;#5454",
]
}
o_dict = {
"ad32s": "rupesh",
"ed532": "Frank",
"dr501u": "David",
"ok132d": "Ripal",
"qw147": "ilesh",
"av345e": "carls",
}
df = pandas.DataFrame.from_dict(d)
values = []
def fn(k):
values.append(o_dict[k])
return df["GroupID"].str.find(k) != -1
conditions = list(map(fn, o_dict))
df["New_column"] = numpy.select(conditions, values, default="NOT IN DIC")
print(df)