将“数据”向量中的大量略有不同的餐厅名称与相应的“匹配”向量相匹配:
stringdistmatrix
程序包中的stringdist
函数很棒,但是用完了大约10k x 10k的内存,我的数据更大了。
尝试as(stringdistmatrix(data, match),'sparseMatrix')
会带来希望的结果,但会耗尽内存。因此,我想使用sparseMatrix(i,j,x,dims,dimnames)
和x
或由相似的字符串距离计算的adist()
显式建立索引对,以希望它适合内存。
R
data <- c("McDonalds", "MacDonalds", "Mc Donald's", "Wendy's", "Wendys", "Wendy",
"Chipotle", "Chipotle's")
match <- c("McDonalds", "Wendys", "Chipotle")
尝试:
library(Matrix)
library(stringdist)
idx <- expand.grid(a=data,b=match)
idx$row <- match(idx$a,idx$b)
idx$col <- match(idx$b,idx$a)
library(Matrix)
sparseMatrix(i=idx$row,
j=idx$col,
x=ifthen(adist(data,match)<2,1,0),
dims=c(7,3),
dimnames = list(data, match))
希望输出匹配:
library(stringdist)
as(ifelse(stringdistmatrix(data,match)<2,1,0),'sparseMatrix')
答案 0 :(得分:1)
如果我正确理解了您的问题,那么您的任务就是将脏字符串与干净字符串匹配。您不需要为此使用整个矩阵(并且确实不会稀疏)。相反,您可以使用$params = [];
$array = [];
$sql = "SELECT lc.*,
py.land_contract_annual_price_year AS `year`,
py.land_contract_annual_price_amount AS `amount`
FROM land_contract AS lc
LEFT JOIN land_contract_annual_price AS py
ON py.land_contract_id = lc.land_contract_id
";
if (isset($_POST['land_contract_id'])) {
$sql .= 'WHERE lc.land_contract_id = ?';
$params[] = $_POST["land_contract_id"];
}
$stmt = $pdo->prepare($sql);
$stmt->execute($params);
while ($row = $stmt->fetch()) {
// Fields we want to extract from the select statement into the array
$select_fields = ['land_contract_id', 'land_contract_name', 'location_id', 'land_contract_link', 'land_contract_notes', 'land_owner_id',
'land_contract_start_date', 'land_contract_end_date', 'land_contract_terminated', 'land_contract_payment_interval',
'land_contract_price_type', 'land_contract_fixed_annual_price '];
if (!isset($array[$row['land_contract_id']])) {
// initialize the subarray if it has not been set already
$array[$row['land_contract_id']] = array_intersect_key($row, array_flip($select_fields));
if ($row['year'] != null) {
$array[$row['land_contract_id']]['land_contract_annual_prices'] = [];
} else {
$array[$row['land_contract_id']]['land_contract_annual_price'] = $row['land_contract_fixed_annual_price'];
}
}
if ($row['year'] != null) {
$array[$row['land_contract_id']]['land_contract_annual_prices'][] = ['year' => $row['year'], 'amount' => $row['amount']];
}
}
if (empty($array)) {
echo "No results";
exit;
}
echo json_encode($array, JSON_UNESCAPED_UNICODE);
。
amatch