在Oracle 11中是否有内置方法来检查varchar2字段中值的相关性?例如,给出一个简单的表格如下:
MEAL_NUM INGREDIENT
--------------------
1 BEEF
1 CHEESE
1 PASTA
2 CHEESE
2 PASTA
2 FISH
3 CHEESE
3 CHICKEN
我希望得到一个基于MEAL_NUM的数字表示,CHEESE主要与PASTA配对,并减少BEEF,CHICKEN和FISH的程度。
我的第一个倾向是使用CORR函数并将字符串转换为数字,或者通过预先枚举它们或从唯一选择中获取rownum。
有任何建议如何解决这个问题?
答案 0 :(得分:3)
您不想使用CORR
- 如果您创建“食物编号”并指定Beef = 1,Chicken = 2和Pasta = 3,则相关系数将告诉您是否增加了奶酪与增加的“食物数量”相关联。但是,“食物数量”更高或更低并不代表任何事情,因为你做了。所以,不要使用CORR
,除非你的食物实际上是以某种方式订购的,比如数字。
统计学家谈论这个问题的方式是levels of measurement。在链接文章的语言中,MEAL_NUM
是名义上的衡量标准 - 或者如果膳食按顺序发生,可能是一种有序的衡量标准,但不管怎样,在其上使用相关系数是一个非常糟糕的主意。
你可能会想要找到类似“牛肉饭中有多少比例也有奶酪?”之类的东西。对于每种成分,以下将返回含有它的膳食数量以及含有它的膳食数量和奶酪。诀窍是COUNT
只计算非空值。
SELECT Other.Ingredient,
COUNT(*) AS TotalMeals,
COUNT(Cheese.Ingredient) AS CheesyMeals
FROM table Other
LEFT JOIN table Cheese
ON (Cheese.Ingredient = 'Cheese'
AND Cheese.Meal_Num = Other.Meal_Num)
GROUP BY Other.Ingredient
警告:如果您在任何一餐中包含两次成分,则会返回错误的结果。
编辑:事实证明你对奶酪不感兴趣。你真的想要所有的“相关”对。因此,我们可以抽出“奶酪”,并将它们称为第一和第二成分。我已经为这一个添加了一个“PossibleScore”,它试图像餐饮百分比一样,但如果该成分的实例很少,则不会给出强烈的分数。
SELECT First.Ingredient,
Second.Ingredient,
COUNT(*) AS MealsWithFirst,
COUNT(First.Ingredient) AS MealsWithBoth,
COUNT(First.Ingredient) / (COUNT(*) + 3) AS PossibleScore,
FROM table First
LEFT JOIN table Second
ON (First.Meal_Num = Second.Meal_Num)
GROUP BY First.Ingredient, Second.Ingredient
按分数排序时,应返回
PASTA CHEESE 2 2 0.400
CHEESE PASTA 3 2 0.333
BEEF CHEESE 1 1 0.250
BEEF PASTA 1 1 0.250
FISH CHEESE 1 1 0.250
FISH PASTA 1 1 0.250
CHICKEN CHEESE 1 1 0.250
PASTA BEEF 2 1 0.200
PASTA FISH 2 1 0.200
CHEESE BEEF 3 1 0.167
CHEESE FISH 3 1 0.167
CHEESE CHICKEN 3 1 0.167
答案 1 :(得分:2)
进行自我加入以获得所有成分组合,然后通过两个进餐点
进行评估SELECT t1.INGREDIENT, t2.INGREDIENT, CORR(t1.MEAL_NUM, t2.MEAL_NUM)
FROM TheTable t1, TheTable t2
WHERE t1.INGREDIENT < t2.INGREDIENT
GROUP BY t1.INGREDIENT, t2.INGREDIENT
应该给你类似的东西:
BEEF CHEESE 0.999
BEEF PASTA 0.998
CHEESE PASTA 0.977
更新:克里斯指出,这不会有效。我希望可能有一些方法来捏造从序数 meal_num到间隔(@Chris,感谢链接)值的映射。这可能是不可能的,在这种情况下,这个答案无济于事。
答案 2 :(得分:1)
--Create sample data
create table meals(meal_num number, ingredient varchar2(10));
insert into meals
select 1, 'BEEF' from dual union all
select 1, 'CHEESE' from dual union all
select 1, 'PASTA' from dual union all
select 2, 'CHEESE' from dual union all
select 2, 'PASTA' from dual union all
select 2, 'FISH' from dual union all
select 3, 'CHEESE' from dual union all
select 3, 'CHICKEN' from dual;
commit;
--Create nested table type to hold results
CREATE OR REPLACE TYPE fi_varchar_nt AS TABLE OF VARCHAR2(10);
/
--Find the items most frequently combined with CHEESE.
select bt.setid, nt.column_value, support occurances_of_itemset
,length, total_tranx
from
(
select
cast(itemset as fi_varchar_nt) itemset, rownum setid
,support, length, total_tranx
from table(dbms_frequent_itemset.fi_transactional(
tranx_cursor => cursor(select meal_num, ingredient from meals),
support_threshold => 0,
itemset_length_min => 2,
itemset_length_max => 2,
including_items => cursor(select 'CHEESE' from dual),
excluding_items => null))
) bt,
table(bt.itemset) nt
where column_value <> 'CHEESE'
order by 3 desc;
SETID COLUMN_VAL OCCURANCES_OF_ITEMSET LENGTH TOTAL_TRANX
---------- ---------- --------------------- ---------- -----------
4 PASTA 2 2 3
3 FISH 1 2 3
1 BEEF 1 2 3
2 CHICKEN 1 2 3
答案 3 :(得分:0)
那样的查询怎么样?
select t1.INGREDIENT, count(*)a
from table t1,
(select meal_num
from table
where INGREDIENT = 'CHEESE') t2
where t1.INGREDIENT <> 'CHEESE'
and t1.meal_num=t2.mealnum
group by t1.INGREDIENT;
结果应该是每种成分与CHEESE分享饭数的时间。