我正在一个研究食谱的项目中。数据超过300万条记录,我正在使用dask来处理数据。我已经标记了配方名称,并且想排除某些字段。我目前在SQL脚本中应用逻辑,如下所示,不包括某些配方名称和术语组合。我的问题是如何将相同的逻辑应用于简单的数据帧:
SELECT match id from table where match_id not in
(SELECT match_id from table where
(term = 'gelat`enter code here`o' and recipe_name like '%Zeroll%') or
(term = 'poachers' and recipe_name like '%Egg%') or
(term = 'poach' and recipe_name like '%Egg%') or
(term = 'waffles' and recipe_name like '%Fries%')
)
import pandas as pd
data = [['1','poach', 'Deviled Eggs'],['2','steam', 'Sweet Dumplings'],['3','chocolate', 'Hot Chocolate']]
df = pd.DataFrame(data,columns=['match_id','term', 'recipename'],dtype=float)
import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=1
)