我有一个数据框,想根据条件创建第三列,例如col3 如果col1中存在col2值,则为“是”,否则为“否”
data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126144409)],76546],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546]]
test = pd.DataFrame(data, columns=['col1','col2'])
col1 col2
0 [(330420, 0.9322496056556702), (76546, 0.93220... 76546
1 [(330420, 0.9322496056556702), (500826, 0.9322... 876546
所需结果:
data = [[[('330420', 0.9322496056556702), ('76546', 0.9322003126
144409)],76546, 'Yes'],[[('330420', 0.9322496056556702), ('500826', 0.9322003126144409)],876546,'No']]
test = pd.DataFrame(data, columns=['col1','col2', 'col3'])
col1 col2 col3
0 [(330420, 0.9322496056556702), (76546, 0.93220... 76546 Yes
1 [(330420, 0.9322496056556702), (500826, 0.9322... 876546 No
我的解决方案:
test['col3'] = [entry for tag in test['col2'] for entry in test['col1'] if tag in entry]
获取错误:ValueError: Length of values does not match length of index
答案 0 :(得分:4)
将any
与zip
一起使用
[any([int(z[0])==y for z in x]) for x, y in zip (test.col1,test.col2)]
Out[227]: [True, False]
答案 1 :(得分:1)
您应避免使用序列表。让我们尝试一个矢量化的解决方案:
# extract array of values and reshape
arr = np.array(df.pop('col1').values.tolist()).reshape(-1, 4)
# join to dataframe and replace list of tuples
df = df.join(pd.DataFrame(arr, dtype=float))
# apply test via isin
df['test'] = df.drop('col2', 1).isin(df['col2']).any(1)
print(df)
col2 0 1 2 3 test
0 76546 330420.0 0.93225 76546.0 0.9322 True
1 876546 330420.0 0.93225 500826.0 0.9322 False
答案 2 :(得分:0)
使用numpy where
:
test['col3'] = test.apply(lambda x: np.where(str(x.col2) in [i[0] for i in x.col1],"yes", "no"), axis =1)
test['col3']
0 yes
1 no
答案 3 :(得分:0)
您可以使用.apply()
def sublist_checker(row):
check_both = ['Yes' if str(row['col2']) in sublist else 'No' for sublist in row['col1']]
check_any = 'Yes' if 'Yes' in check_both else 'No'
return check_any
test['col3'] = test.apply(sublist_checker, axis=1)
print(test)
col1 col2 col3
0 [(330420, 0.932249605656), (76546, 0.932200312614)] 76546 Yes
1 [(330420, 0.932249605656), (500826, 0.932200312614)] 876546 No
函数sublist_checker
对test['col2']
中找到的每个子列表对test['col1']
中的每个元素进行行检查,并返回Yes
或No
基于任何子列表中该元素的存在或不存在。