Question

我教自己的Powershell，所以我不知道它的一切。

我需要使用我输入的确切行数（数据库是预定义的）搜索数据库，它包含＆gt; 11800个条目。

你能帮我找一下让这个慢的原因吗？

代码：

import UIKit
import Parse
import Bolts

var userToAdd=""

class AddFriendViewController: UIViewController {

@IBOutlet var addFriendLabel: UIButton!
@IBOutlet var searchUserTF: UITextField!
override func viewDidLoad() {
    super.viewDidLoad()
    addFriendLabel.setTitle("", forState: .Normal)
    // Do any additional setup after loading the view.
}

override func didReceiveMemoryWarning() {
    super.didReceiveMemoryWarning()
    // Dispose of any resources that can be recreated.
}

@IBAction func searchButtonPressed(sender: AnyObject) {

    if searchUserTF.text != "" {
       let username=searchUserTF.text
        var query = PFUser.query()
        query.whereKey("username", equalTo:username)
        query?.findObjectsInBackgroundWithBlock({ (objects:[PFObject]?, error:NSError?) -> Void in

            if (error != nil) {
               print(error)
            } else {
               if objects!.count >0 {
                   self.addFriendLabel.setTitle("Share myEvents with \(username)", forState: .Normal)
                   self.userToAdd=username
               } else {
                   self.userToAdd=""
                   let alertController = UIAlertController(title: "Username not found!", message: "A user with this username does not exist, please check the spelling or your internet connection", preferredStyle: .Alert)
                   let action = UIAlertAction(title: "OK", style: .Default, handler: nil)
                   alertController.addAction(action)
                   self.presentViewController(alertController, animated: true, completion: nil)
             }
         })
    }
}

如何逐行搜索/匹配处理速度？我是通过多线程做的吗？如果是这样的话？

Answer 1

在powershell之前，你似乎至少知道了另一种语言，并且最初基本上复制了你在另一种语言中所做的事情。这是学习一门新语言的好方法，但当然在开始时你最终可能会遇到一些有点奇怪或不具备性能的方法。

首先，我想分解你的代码实际上在做什么，作为粗略的概述：

一次读取文件的每一行并将其存储在$Dict变量中。
循环次数与行数相同。
在循环的每次迭代中：
1. 获取与循环迭代匹配的单行（主要通过另一次迭代，而不是索引，稍后再详述）。
2. 获取该行的第一个字符，然后获取第二个字符，然后合并它们。
3. 如果它等于预先确定的字符串，请将此行追加到文本文件中。

步骤3-1是真正放慢速度

要了解原因，您需要了解PowerShell中的管道。接受和处理管道的Cmdlet会占用一个或多个对象，但它们一次处理一个对象。他们甚至无法访问管道的其余部分。

Select-Object cmdlet也是如此。因此，当您将一个包含18,500个对象的数组放入其中并将其输入Select-Object -Index 18000时，您需要发送17,999个对象进行检查/处理，然后才能为您提供所需的对象。您可以看到索引越大，所用时间越长越长。

由于你已经有了一个数组，你可以通过方括号[]的索引直接访问任何数组成员，如下所示：

$Dict[18000]

对于给定的数组，无论索引是什么，都需要相同的时间。

现在对Select-Object -Index的单次调用你可能不会注意到需要多长时间，即使索引非常大;问题是你已经在整个数组中循环，所以这很复杂。

你基本上不得不做1..18000的总和，大概是 $\frac{18000^2}2$ 或大约162,000,000次迭代！（感谢user2460798纠正我的数学）

证明

我测试了这个。首先，我创建了一个包含19,000个对象的数组：

$a = 1..19000 | %{"zzzz~$_"}

然后我测量了两种访问它的方法。首先，使用select -index：

measure-command { 1..19000 | % { $a | select -Index ($_-1 ) } | out-null }

结果：

TotalMinutes      : 20.4383861316667
TotalMilliseconds : 1226303.1679

然后使用索引操作符（[]）：

measure-command { 1..19000 | % { $a[$_-1] } | out-null }

结果：

TotalMinutes      : 0.00788774666666667
TotalMilliseconds : 473.2648

结果非常引人注目，使用Select-Object需要近2,600倍。

计数循环

以上是造成主要经济放缓的唯一因素，但我想指出其他一些事情。

通常在大多数语言中，您都会使用for循环进行计数。在PowerShell中，这将是这样的：

for ($i = 0; $i -lt $total ; $i++) {
    # $i has the value of the iteration
}

简而言之，for循环中有三个语句。第一个是在循环开始之前运行的表达式。 $i = 0将迭代器初始化为0，这是第一个语句的典型用法。

接下来是有条件的;这将在每次迭代时进行测试，如果返回true，循环将继续。这里$i -lt $total比较检查，看$i小于$total的值，其他一些变量在其他地方定义，可能是最大值。

最后一个语句在循环的每次迭代中执行。 $i++与$i = $i + 1相同，因此在这种情况下，我们会在每次迭代时递增$i。

它比使用do / until循环更简洁，并且更容易理解，因为for循环的含义是众所周知的。

其他注释

如果您对有关工作代码的更多反馈感兴趣，请查看Code Review。在发布之前请仔细阅读那里的规则。

Answer 2

令我惊讶的是，使用数组GetEnumerator比索引更快。它需要大约5/8的索引时间。然而，这个测试是非常不现实的，因为每个循环的主体大约尽可能小。

$size = 64kb

$array = new int[] $size
# Initializing the array takes quite a bit of time compared to the loops below
0..($size-1) | % { $array[$_] = get-random}

write-host `n`nMeasure using indexing
[uint64]$sum = 0
Measure-Command {
  for ($ndx = 0; $ndx -lt $size; $ndx++) {
    $sum += $array[$ndx]
  }
}
write-host Average = ($sum / $size)

write-host `n`nMeasure using array enumerator
[uint64]$sum = 0
Measure-Command {
  foreach ($element in $array.GetEnumerator()) {
    $sum += $element
  }
}
write-host Average = ($sum / $size)



Measure using indexing


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 898
Ticks             : 8987213
TotalDays         : 1.04018668981481E-05
TotalHours        : 0.000249644805555556
TotalMinutes      : 0.0149786883333333
TotalSeconds      : 0.8987213
TotalMilliseconds : 898.7213

Average = 1070386366.9346


Measure using array enumerator
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 559
Ticks             : 5597112
TotalDays         : 6.47813888888889E-06
TotalHours        : 0.000155475333333333
TotalMinutes      : 0.00932852
TotalSeconds      : 0.5597112
TotalMilliseconds : 559.7112

Average = 1070386366.9346

这两个汇编程序的代码可能看起来像

;       Using Indexing
mov     esi, <addr of array>
xor     ebx, ebx
lea     edi, <addr of $sum>
loop:
mov     eax, dword ptr [esi][ebx*4]
add     dword ptr [edi], eax
inc     ebx
cmp     ebx, 65536
jl      loop

;       Using enumerator
mov     esi, <addr of array>
lea     edx, [esi + 65356*4]
lea     edi, <addr of $sum>
loop:
mov     eax, dword ptr [esi]
add     dword ptr [edi], eax
add     esi, 4
cmp     esi, edx
jl      loop

唯一的区别在于循环中的第一个mov指令，一个使用索引寄存器而另一个不使用。我怀疑这可以解释观察到的速度差异。我想JITter必须增加额外的开销。

需要更快地制作PowerShell脚本

2 个答案:

步骤3-1是真正放慢速度

证明

计数循环

其他注释