我有一个控制台脚本(在yii2应用程序中),用于更改用户在DB(postgreSQL)中的用户名并将changelog数据写入csv-file。我使用for循环来通过偏移对100个用户的批量进行更改。
public function actionTest()
{
$query = User::find()->where(['username' => '']);
$total = $query->count(); // SQL variant - SELECT COUNT(*) FROM user WHERE username = ''
$data = [];
$filePath = '/path/to/folder/log.csv';
for ($offset = 0; $offset <= $total; $offset += 100) {
/** @var User[] $users */
$users = $query->orderBy(['id' => SORT_ASC])->limit(100)->offset($offset)->all(); // SQL variant - SELECT * FROM user WHERE username = '' ORDER BY id ASC OFFSET 0 LIMIT 100
foreach ($users as $user) {
User::updateAll(['username' => 'newUsername'], ['id' => $user->id]); // SQL variant - UPDATE user SET username = 'newUsername' WHERE id = 1
$data[] = ['username' => 'newUsername']; // collect data to generate csv-file in the future
}
$csvObj = new CSV(); // "mnshankar/csv": "1.8"
$csvObj->with($data, false, 'a+')->put($filePath, 'a+');
$data = [];
}
}
问题是这个脚本在用户总数中间停止从数据库中获取数据,所以我在$ users数组中得到0项。
例如,如果我有$ total = 15000,它会在$ offset = 7500的itteration后停止工作, 如果$ total = 7500,它会在itteration后停止工作,$ offset = 3800,如果$ total = 3800,它会在itteration后停止工作,$ offset = 1900等。
我尝试用pg_ *函数为这个循环编写简单的测试,它可以正常工作:
public function actionPgTest()
{
$dbConnection = pg_connect("host=localhost port=8080 dbname=user_db user=some_guy password=some_pass");
$total = pg_query($dbConnection,'SELECT COUNT(*) FROM user WHERE username = \'\'');
$total = pg_fetch_array($total)['count'];
for ($offset = 0; $offset <= $total; $offset += 100) {
$query = 'SELECT * FROM user WHERE username = \'\' ORDER BY id ASC LIMIT 100 OFFSET ' . $offset;
$users = pg_query($dbConnection,$query);
$users = pg_fetch_all($users);
sleep(3);
}
pg_close();
}
另外,我尝试使用bash-script执行此操作,它也可以正常运行:
#!/bin/bash count_query="select count(*) FROM \"user\" WHERE username = ''" count=$(echo $count_query | psql -U user -Atq user_db) query_base="select id FROM \"user\" WHERE username = '' LIMIT 100 OFFSET " for offset in $(seq 0 100 $count); do echo $query_base$offset| psql -U user -Atq user_db sleep 3; done;
另外,我尝试运行脚本而不生成csv文件,并在中间遇到同样的问题。
答案 0 :(得分:0)
它会继续并将返回空数据,因为偏移设置与总记录相同。
这里if offset from postgreSQL docs:
OFFSET表示在开始返回行之前跳过那么多行。 OFFSET 0与省略OFFSET子句相同。如果OFFSET和 出现LIMIT,然后在开始计数之前跳过OFFSET行 返回的LIMIT行。
也可以从这里阅读:https://www.postgresql.org/docs/8.0/static/queries-limit.html
答案 1 :(得分:0)
解决!问题出在OFFSET和LIMIT(devprashant的评论有很多帮助)。 例如,我们有6个项目的表格,每个项目都有减号:
对于第一次尝试,我们有OFFSET = 0和LIMIT = 2并且正在改变第一个2减去以获得:
第二次尝试将使用OFFSET = 2和LIMIT = 2,我们得到id = 5且id = 6的项目。由于偏移,查询结果从id = 5开始,限制为2项。所以我们得到:
这就是我们如何获得算术级数,并且在物品总量中间之前没有物品。
工作解决方案:
public function actionTest(){
$query = User::find()->where(['username' => '']);
$idsQuery = clone $query;
$userIds = $idsQuery->select(['id'])->limit(1000000)->asArray(true)->indexBy('id')->all();
$userIds = array_keys($userIds);
asort($userIds);
$total = count($userIds);
$data = [];
$filePath = '/path/to/folder/log.csv';
for ($offset = 0; $offset <= $total; $offset += 100) {
$query = User::find()->where(['id' => array_slice($userIds , $offset, 100)]);
$users = $query->all();
foreach ($users as $user) {
User::updateAll(['username' => 'newUsername'], ['id' => $user->id]); // SQL variant - UPDATE user SET username = 'newUsername' WHERE id = 1
$data[] = ['username' => 'newUsername']; // collect data to generate csv-file in the future
}
$csvObj = new CSV(); // "mnshankar/csv": "1.8"
$csvObj->with($data, false, 'a+')->put($filePath, 'a+');
$data = [];
}}