我有一个包含产品名称列表的表格。我需要统计每种产品。一些产品名称是在不同情况下编写的,例如:“Juice”产品 - 果汁,果汁等。我需要将这些产品组合在一起并使用bigquery显示计数
果汁 - 100
果汁14
牛奶-10
牛奶3
MIL-1
上表必须如下所示
果汁 - 114 牛奶 - 14答案 0 :(得分:1)
如果没有您想要考虑的拼写错误的单词 - 解决方案就像下面的
一样简单SELECT LOWER(word) AS word, SUM(cnt) AS cnt
FROM YourTable
GROUP BY 1
但在你的情况下,你需要先处理相似性 请在下面查看要考虑的选项
首先让我们来看看高级逻辑/步骤
步骤0 - 假设您的表(YourTable)如下所示
SELECT
word, cnt
FROM
(SELECT 'Juice' AS word, 100 AS cnt),
(SELECT 'juice' AS word, 14 AS cnt),
(SELECT 'Milk' AS word, 10 AS cnt),
(SELECT 'milk' AS word, 3 AS cnt),
(SELECT 'milkk' AS word, 1 AS cnt),
(SELECT 'mil' AS word, 1 AS cnt)
第1步 - 计算相似度
让我们只考虑那些在0.5和1之间具有相似性的那些 因此,预期结果将如下所示
word replacement similarity
milkk milk 0.8
mil milk 0.6666666666666667
milkk mil 0.6
第2步 - 寻找获奖者
你会期望:
word replacement
milkk milk
mil milk
第3步 - 最终聚合
word cnt
juice 114
milk 15
以下是各自的代码
最有可能是优化,改进和组合 - 但它就是给你一个想法(和工作代码)的方式
查询1(步骤1) - 替换候选人
让我们将输出写入表格 - >替换
SELECT text1 AS word, text2 AS replacement, similarity FROM
JS(
// input table
(
SELECT
word1 AS text1,
word2 AS text2
FROM (
SELECT
CASE WHEN a.cnt < b.cnt THEN a.word ELSE b.word END AS word1,
CASE WHEN a.cnt < b.cnt THEN b.word ELSE a.word END AS word2
FROM (
SELECT LOWER(word) AS word, SUM(cnt) AS cnt
FROM YourTable
GROUP BY 1
) AS a
CROSS JOIN (
SELECT LOWER(word) AS word, SUM(cnt) AS cnt
FROM YourTable
GROUP BY 1
) AS b
WHERE a.word <= b.word
)
) ,
// input columns
text1, text2,
// output schema
"[{name: 'text1', type:'string'},
{name: 'text2', type:'string'},
{name: 'similarity', type:'float'}]
",
// function
"function(r, emit) {
var _extend = function(dst) {
var sources = Array.prototype.slice.call(arguments, 1);
for (var i=0; i<sources.length; ++i) {
var src = sources[i];
for (var p in src) {
if (src.hasOwnProperty(p)) dst[p] = src[p];
}
}
return dst;
};
var Levenshtein = {
/**
* Calculate levenshtein distance of the two strings.
*
* @param str1 String the first string.
* @param str2 String the second string.
* @return Integer the levenshtein distance (0 and above).
*/
get: function(str1, str2) {
// base cases
if (str1 === str2) return 0;
if (str1.length === 0) return str2.length;
if (str2.length === 0) return str1.length;
// two rows
var prevRow = new Array(str2.length + 1),
curCol, nextCol, i, j, tmp;
// initialise previous row
for (i=0; i<prevRow.length; ++i) {
prevRow[i] = i;
}
// calculate current row distance from previous row
for (i=0; i<str1.length; ++i) {
nextCol = i + 1;
for (j=0; j<str2.length; ++j) {
curCol = nextCol;
// substution
nextCol = prevRow[j] + ( (str1.charAt(i) === str2.charAt(j)) ? 0 : 1 );
// insertion
tmp = curCol + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// deletion
tmp = prevRow[j + 1] + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// copy current col value into previous (in preparation for next iteration)
prevRow[j] = curCol;
}
// copy last col value into previous (in preparation for next iteration)
prevRow[j] = nextCol;
}
return nextCol;
}
};
var the_text1;
try {
the_text1 = decodeURI(r.text1).toLowerCase();
} catch (ex) {
the_text1 = r.text1.toLowerCase();
}
try {
the_text2 = decodeURI(r.text2).toLowerCase();
} catch (ex) {
the_text2 = r.text2.toLowerCase();
}
emit({text1: the_text1, text2: the_text2,
similarity: 1 - Levenshtein.get(the_text1, the_text2) / the_text1.length});
}"
)
WHERE similarity > 0.5 AND similarity < 1
ORDER BY similarity DESC
查询2(步骤2) - 替换获胜者
SELECT word, replacement FROM (
SELECT
a.word AS word, a.replacement AS replacement, b.replacement, b.weight,
ROW_NUMBER() OVER(PARTITION BY a.word ORDER BY b.weight DESC) AS win
FROM (
SELECT word, replacement
FROM Replacements
) a
JOIN (
SELECT replacement, COUNT(1) AS weight
FROM Replacements
GROUP BY replacement
) b
ON a.replacement = b.replacement
)
WHERE win = 1
查询3(第2步和第3步合并) - 替换和最终聚合
SELECT
IFNULL(y.replacement, x.word) AS word,
SUM(cnt) AS cnt
FROM (
SELECT LOWER(word) AS word, SUM(cnt) AS cnt
FROM YourTable
GROUP BY 1
) x
LEFT JOIN (
SELECT word, replacement
FROM (
SELECT
a.word AS word, a.replacement AS replacement, b.replacement, b.weight,
ROW_NUMBER() OVER(PARTITION BY a.word ORDER BY b.weight DESC) AS win
FROM (
SELECT word, replacement
FROM Replacements
) a
JOIN (
SELECT replacement, COUNT(1) AS weight
FROM Replacements
GROUP BY replacement
) b
ON a.replacement = b.replacement
)
WHERE win = 1
) y
ON x.word = y.word
GROUP BY word
即使上述工作 - 并且您可以通过示例运行它 - 我无法保证这将完全按照您对实际数据的预期工作。但我希望这能为你提供一个探索的好方向
答案 1 :(得分:0)
这适合你吗?
public FileResult Download(int id)
{
string contentType = "";
var arquivos = db.Anexos.ToList();
string nomeArquivo = (from arquivo in arquivos
where arquivo.AnexoId == id
select arquivo.Caminho).First();
string extensao = Path.GetExtension(nomeArquivo);
string nomeArquivoV = Path.GetFileNameWithoutExtension(nomeArquivo);
System.Diagnostics.Debug.WriteLine("~/Anexos/" + nomeArquivoV + extensao);
if (extensao.Equals(".zip"))
contentType = "application/zip";
return File(nomeArquivo, contentType,"~/Anexos/" + nomeArquivoV + extensao);
}