我正在尝试使用并行处理,以根据内容分离数据。
在下面的示例中,我生成随机数,如果满足条件,我想将它们存储到数据表中。
令我失望的是顺序比平行更快。
是否可以让工作更快并行?
Imports System.Random
Imports System.Threading
Imports System.Threading.Tasks
Public Class Form1
Public No As Integer = 5
Public DT(No) As DataTable
Public S(No) As String
Public StartTimer As DateTime
Private Sub ParrallelProc_Btn_Click(sender As Object, e As EventArgs) Handles ParrallelProc_Btn.Click
For j = 1 To No
DT(j).Rows.Clear()
Next
StartTimer = Now
For k = 1 To 10000
Parallel.For(1, No + 1, Sub(i)
Dim CurrentNo As String = CStr(Math.Round(Rnd() * 1000000, 0))
If CurrentNo.Contains(S(i)) Then DT(i).Rows.Add(CurrentNo)
End Sub)
Next
Dim Interval = Now.Subtract(StartTimer).TotalSeconds
End Sub
Private Sub SequentialProc_Btn_Click(sender As Object, e As EventArgs) Handles SequentialProc_Btn.Click
For j = 1 To No
DT(j).Rows.Clear()
Next
StartTimer = Now
For k = 1 To 10000
For l = 1 To No
Dim CurrentNo As String = CStr(Math.Round(Rnd() * 1000000, 0))
If CurrentNo.Contains(S(l)) Then DT(l).Rows.Add(CurrentNo)
Next
Next
Dim Interval = Now.Subtract(StartTimer).TotalSeconds
End Sub
End Class
答案 0 :(得分:0)
首先,不要吹嘘,但我的电脑在160毫秒内运行并行,并在40毫秒内顺序运行。
创建线程有一些开销,只有5个线程是不必要的 - 你可能只做5件事。特别是像你一样轻巧的东西。并行化是为了同时执行多个长时间运行的任务。
最终,一旦您克服了线程开销,并行循环就会更快。我已经通过增加No
进行了测试,这种情况发生在100左右。
Public No As Integer = 100
Public DT(No) As DataTable
Public S(No) As String
Public StartTimer As DateTime
Private iterations As Integer = 10000
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
For i = 1 To No
DT(i) = New DataTable()
DT(i).Columns.Add()
S(i) = (i + 1).ToString()
Next
End Sub
Private Sub ParallelProc_Btn_Click(sender As Object, e As EventArgs) Handles ParallelProc_Btn.Click
clearDT()
Dim sw As New Stopwatch()
sw.Start()
For k = 1 To iterations
Parallel.For(
1,
No + 1,
AddressOf process)
Next
sw.Stop()
MessageBox.Show(sw.ElapsedMilliseconds)
End Sub
Private Sub SequentialProc_Btn_Click(sender As Object, e As EventArgs) Handles SequentialProc_Btn.Click
clearDT()
Dim sw As New Stopwatch()
sw.Start()
For k = 1 To iterations
For i = 1 To No
process(i)
Next
Next
MessageBox.Show(sw.ElapsedMilliseconds)
End Sub
Private Sub clearDT()
For j = 1 To No
DT(j).Rows.Clear()
Next
End Sub
Private Sub process(i As Integer)
Randomize()
Dim CurrentNo As String = CStr(Math.Round(Rnd() * 1000000, 0))
If CurrentNo.Contains(S(i)) Then DT(i).Rows.Add(CurrentNo)
End Sub
我还将操作移动到Sub,这两个例程都可以调用它。重用代码不仅可以节省时间和空间,还可以确保只是比较方法,而不是例程。
在使用Randomize()
之前,您还应该致电Rnd()
。见https://msdn.microsoft.com/en-us/library/y66ey2hh(v=vs.110).aspx
更好的测试是在process()
方法中添加一些实质内容,例如Thread.Sleep(1)
,并使用No
和iterations
。你会发现平行睡觉比按顺序睡觉要好得多。
答案 1 :(得分:0)
将较小的循环放在较大的循环中,它应该使并行循环比顺序循环快得多。
#Transform the kind column to free or occupied only
df.kind = df.kind.replace('[^P]','free',regex=True).replace('P','occupied')
#Convert kind from long to wide columns
df = pd.get_dummies(df,columns=['kind'],prefix='',prefix_sep='')
#get total
df['total']=df.free+df.occupied
#groupby and sum
df.groupby(['date','sector']).sum()
Out[322]:
free occupied total
date sector
2017-02-01 A 2 2 4
B 2 2 4
2017-02-02 A 2 2 4
B 3 1 4