我相信我有一个复杂的问题,我正在努力寻找解决方案,而对于我的生活,我似乎无法实现它。
我需要对数千的银行交易进行分析,以找出其描述中的相似之处。
首先,我有一系列交易,按月分组,这里是一个小样本:
$sample_transactions = array(
['Oct_2017']=>array(4) {
[0] => array(2) {
["desc"]=>string(55) "INTERNET TRANSFER CREDIT FROM 34345555 REF NO 21283322"
["amount"]=>string(4) "1290"
}
[1] => array(2) {
["desc"]=>string(55) "INTERNET TRANSFER CREDIT FROM 34345555 REF NO 8765876"
["amount"]=>string(4) "1000"
}
[2] => array(2) {
["desc"]=>string(55) "INTERNET TRANSFER CREDIT FROM 785674556 REF NO 46312212"
["amount"]=>string(4) "2500"
}
[3] => array(2) {
["desc"]=>string(55) "INTERNET TRANSFER CREDIT FROM 785674556 REF NO 977553"
["amount"]=>string(4) "4000"
}
}
['Nov_2017']=>array(4) {
[0] => array(2) {
["desc"]=>string(55) "PHONE TRANSFER CREDIT FROM 65765544 REF NO 123444"
["amount"]=>string(4) "879"
}
[1] => array(2) {
["desc"]=>string(55) "EFTPOS JKL REV JANES HAIR MELBOURNE VIC AU"
["amount"]=>string(4) "200"
}
[2] => array(2) {
["desc"]=>string(55) "INTERNET TRANSFER CREDIT FROM 785674556 REF NO 46312212"
["amount"]=>string(4) "3200"
}
[3] => array(2) {
["desc"]=>string(55) "INTERNET TRANSFER CREDIT FROM 785674556 REF NO 977553"
["amount"]=>string(4) "6039"
}
}
);
使用上面的示例事务,希望在 desc 中查找相似之处,然后将它们组合在一起,添加一个计数,并增加总数。
Oct_2017
=================================================================
| Desc. | Amount | Count |
=================================================================
| TRANSFER CREDIT FROM 34345555 REF NO 2290 2 |
-----------------------------------------------------------------
| TRANSFER CREDIT FROM 785674556 REF NO 6500 2 |
=================================================================
Nov_2017
=================================================================
| Desc. | Amount | Count |
=================================================================
| TRANSFER CREDIT FROM 785674556 9239 2 |
=================================================================
如果您注意到两个数据表,它会执行以下操作:
编辑:这些交易是从第三方来源提供给我们的,并保存在我们的Mysql数据库中,因此字符串可以是任何。这意味着我们不能找到要查找的字符串集合,因为我们不知道我们在寻找什么。我们需要突出交易中的“模式”而不是我们正在寻找的东西。
Edit2:更多示例字符串可能是:
RETURNED CREDIT FROM Mr Nobody 9392 JKK freight ACCOUNT CLOSED
RETURNED CREDIT FROM Mrs Somebody Melbourne Aus INVALID ACCOUNT NUMBER
VISA CREDIT HERTZ GOKKO JIMBO 14/08 AU AUD
EFTPOS DEP Medicare Benefit
DIRECT CREDIT CBA TRANSFER
BPAY REV 3535333 KLM RENEW 4823
AGENT DEPOSIT 87
ANZ ATM PORTLAND 26 NOTHING ST PORTLAND VIC
DIRECT CREDIT DONTY BENEFITS 23423322 EYWQ
1 - 我尝试this approach来比较字符串,但是,使用similar_text()
不够具体 - 它导致制作的组不应该仅基于相似度百分比,并且不会根据需要的字符进行分组。
2 - 我尝试使用ORDER BY desc ASC
从数据库查询,这显然很好地命令它们,但包括所有内容,并且不会像预期的结果那样对它们进行分组。
答案 0 :(得分:0)
坏消息是:php不提供开箱即用的任何操作。
好消息是:您可以自己动手并修改它,因为您可以了解更多关于您想要称之为“类似”的内容。随着您的定制方法的成熟,其准确性将提升至接近100%。使用“剩余物”来发现可以嵌入到代码中的相似之处。
我希望这足以让你有所吸引力:
代码:(Demo)
$sample_transactions=[
'Oct_2017'=>[
["desc"=>"INTERNET TRANSFER CREDIT FROM 34345555 REF NO 21283322","amount"=>"1290"],
["desc"=>"INTERNET TRANSFER CREDIT FROM 34345555 REF NO 8765876","amount"=>"1000"],
["desc"=>"INTERNET TRANSFER CREDIT FROM 785674556 REF NO 46312212","amount"=>"2500"],
["desc"=>"INTERNET TRANSFER CREDIT FROM 785674556 REF NO 977553","amount"=>"4000"],
["desc"=>"PHONE TRANSFER CREDIT FROM 65765544 REF NO 123444","amount"=>"879"],
["desc"=>"EFTPOS JKL REV JANES HAIR MELBOURNE VIC AU","amount"=>"200"],
["desc"=>"INTERNET TRANSFER CREDIT FROM 785674556 REF NO 46312212","amount"=>"3200"],
["desc"=>"INTERNET TRANSFER CREDIT FROM 785674556 REF NO 977553","amount"=>"6039"],
["desc"=>"RETURNED CREDIT FROM Mr Nobody 9392 JKK freight ACCOUNT CLOSED","amount"=>"123"],
["desc"=>"RETURNED CREDIT FROM Mrs Somebody Melbourne Aus INVALID ACCOUNT NUMBER","amount"=>"124"],
["desc"=>"VISA CREDIT HERTZ GOKKO JIMBO 14/08 AU AUD","amount"=>"1234"],
["desc"=>"EFTPOS DEP Medicare Benefit","amount"=>"999"],
["desc"=>"DIRECT CREDIT CBA TRANSFER","amount"=>"1050"],
["desc"=>"BPAY REV 3535333 KLM RENEW 4823","amount"=>"1175"],
["desc"=>"AGENT DEPOSIT 87","amount"=>"100"],
["desc"=>"ANZ ATM PORTLAND 26 NOTHING ST PORTLAND VIC","amount"=>"200"],
["desc"=>"DIRECT CREDIT DONTY BENEFITS 23423322 EYWQ","amount"=>"300"]
]
];
foreach($sample_transactions as $mo_year=>$trans_array){
foreach($trans_array as $trans){
if(!$key=strstr($trans['desc'],' FROM ',true)){ // declare $key if no ' FROM ' try something else
if(strpos($trans['desc'],'DIRECT CREDIT')===0){ // try 'DIRECT CREDIT' at start of string
$key='DIRECT CREDIT';
}else{
$key=$trans['desc']; // if all attempts fail, default to fullstring value
}
}
if(!isset($groups[$mo_year][$key])){
$groups[$mo_year][$key]=['Amount'=>$trans['amount'],'Count'=>1]; //initialize the row
}else{
$groups[$mo_year][$key]=[
'Amount'=>$groups[$mo_year][$key]['Amount']+$trans['amount'], // do the sum
'Count'=>++$groups[$mo_year][$key]['Count'] // increment by 1
];
}
}
}
// split results into two groups based on Count value
foreach($groups as $mo_year=>$rows){
foreach($rows as $desc=>$sums){
if($sums['Count']<2){
$singletons[$mo_year][$desc]=$sums;
}else{
$similarities[$mo_year][$desc]=$sums;
}
}
}
echo "Attempted Consolidation:\n";
var_export($groups);
echo "\n\nSimilarities:\n";
var_export($similarities);
echo "\n\nThe leftovers to review and try to isolate relevant similarities\n";
var_export($singletons);
输出:
Attempted Consolidation:
array (
'Oct_2017' =>
array (
'INTERNET TRANSFER CREDIT' =>
array (
'Amount' => 18029,
'Count' => 6,
),
'PHONE TRANSFER CREDIT' =>
array (
'Amount' => '879',
'Count' => 1,
),
'EFTPOS JKL REV JANES HAIR MELBOURNE VIC AU' =>
array (
'Amount' => '200',
'Count' => 1,
),
'RETURNED CREDIT' =>
array (
'Amount' => 247,
'Count' => 2,
),
'VISA CREDIT HERTZ GOKKO JIMBO 14/08 AU AUD' =>
array (
'Amount' => '1234',
'Count' => 1,
),
'EFTPOS DEP Medicare Benefit' =>
array (
'Amount' => '999',
'Count' => 1,
),
'DIRECT CREDIT' =>
array (
'Amount' => 1350,
'Count' => 2,
),
'BPAY REV 3535333 KLM RENEW 4823' =>
array (
'Amount' => '1175',
'Count' => 1,
),
'AGENT DEPOSIT 87' =>
array (
'Amount' => '100',
'Count' => 1,
),
'ANZ ATM PORTLAND 26 NOTHING ST PORTLAND VIC' =>
array (
'Amount' => '200',
'Count' => 1,
),
),
)
Similarities:
array (
'Oct_2017' =>
array (
'INTERNET TRANSFER CREDIT' =>
array (
'Amount' => 18029,
'Count' => 6,
),
'RETURNED CREDIT' =>
array (
'Amount' => 247,
'Count' => 2,
),
'DIRECT CREDIT' =>
array (
'Amount' => 1350,
'Count' => 2,
),
),
)
The leftovers to review and try to isolate relevant similarities
array (
'Oct_2017' =>
array (
'PHONE TRANSFER CREDIT' =>
array (
'Amount' => '879',
'Count' => 1,
),
'EFTPOS JKL REV JANES HAIR MELBOURNE VIC AU' =>
array (
'Amount' => '200',
'Count' => 1,
),
'VISA CREDIT HERTZ GOKKO JIMBO 14/08 AU AUD' =>
array (
'Amount' => '1234',
'Count' => 1,
),
'EFTPOS DEP Medicare Benefit' =>
array (
'Amount' => '999',
'Count' => 1,
),
'BPAY REV 3535333 KLM RENEW 4823' =>
array (
'Amount' => '1175',
'Count' => 1,
),
'AGENT DEPOSIT 87' =>
array (
'Amount' => '100',
'Count' => 1,
),
'ANZ ATM PORTLAND 26 NOTHING ST PORTLAND VIC' =>
array (
'Amount' => '200',
'Count' => 1,
),
),
)