我有一个非常简单的问题和数据结构,但是数量如此之大,我需要找到一种有效的方法。
假设我有一个对象,该对象的属性为间隔。 例如:
int main() {
// Values for time duration
LARGE_INTEGER tFreq, tStart, tEnd;
cudaEvent_t start, stop;
float tms, ms;
int a[N], b[N], c[N]; // CPU values
int *dev_a, *dev_b, *dev_c; // GPU values----------------------------------------------
// Creating alloc for GPU--------------------------------------------------------------
cudaMalloc((void**)&dev_a, N * sizeof(int));
cudaMalloc((void**)&dev_b, N * sizeof(int));
cudaMalloc((void**)&dev_c, N * sizeof(int));
// Fill 'a' and 'b' from CPU
for (int i = 0; i < N; i++) {
a[i] = -i;
b[i] = i * i;
}
// Copy values of CPU to GPU values----------------------------------------------------
cudaMemcpy(dev_a, a, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, N * sizeof(int), cudaMemcpyHostToDevice);
//////////////////////////////////////
QueryPerformanceFrequency(&tFreq); // Frequency set
QueryPerformanceCounter(&tStart); // Time count Start
// CPU operation
add(a, b, c);
//////////////////////////////////////
QueryPerformanceCounter(&tEnd); // TIme count End
tms = ((tEnd.QuadPart - tStart.QuadPart) / (float)tFreq.QuadPart) * 1000;
//////////////////////////////////////
// show result of CPU
cout << fixed;
cout.precision(10);
cout << "CPU Time=" << tms << endl << endl;
for (int i = 0; i < N; i++) {
printf("CPU calculate = %d + %d = %d\n", a[i], b[i], c[i]);
}
cout << endl;
///////////////////////////////////////
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
// GPU operatinog---------------------------------------------------------------------
//add2 <<<N,1 >>> (dev_a, dev_b, dev_c); // block
//add2 << <1,N >> > (dev_a, dev_b, dev_c); // Thread
add2 << <N/32+1, 32 >> > (dev_a, dev_b, dev_c); // grid
///////////////////////////////////////
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&ms, start, stop);
///////////////////////////////////////
// show result of GPU
cudaMemcpy(c, dev_c, N * sizeof(int), cudaMemcpyDeviceToHost);
cout << fixed;
cout.precision(10);
cout << "GPU Time=" << ms << endl << endl;
for (int i = 0; i < N; i++) {
printf("GPU calculate = %d + %d = %d\n", a[i], b[i], c[i]);
}
//Free GPU values
cudaFree(dev_a);
cudaFree(dev_b);
cudaFree(dev_c);
return 0;
}
我想合并它,以便重叠间隔成为一个对象。因此,示例的结果将变为
`start stop`
obj1 5 10
obj2 8 12
obj3 11 14
obj4 13 20
obj5 22 25
obj6 24 30
obj7 33 37
obj8 36 40
我为此使用python。请注意,我有成千上万的此类数据。
答案 0 :(得分:1)
df['Startpoint'] = df['stop`'].shift() < df['`start'] # Begin of interval
df['Endpoint'] = df['Startpoint'].shift(-1) # End of interval
df.loc['obj1', 'Startpoint'] = True # First line is startpoint
df['Endpoint'].fillna(True, inplace=True) # Last line is endpoint
df2 = df[df[['Startpoint', 'Endpoint']].any(axis=1)]
df2['`start'] = df2['`start'].shift()
df2.loc[df2['Endpoint'], ['`start', 'stop`']]
# `start stop`
# obj4 5.0 20
# obj6 22.0 30
# obj8 33.0 40
查找间隔的所有开始和结束,仅保留那些行,然后将起始值移动一行,以使每个间隔的值在同一行中。
这都是大熊猫,所以我认为应该很快。
答案 1 :(得分:0)
按时间间隔对间隔进行排序时,此简单函数应在线性时间内工作:
def merge_intervals(intervals):
result = []
(start_candidate, stop_candidate) = intervals[0]
for (start, stop) in intervals[1:]:
if start <= stop_candidate:
stop_candidate = max(stop, stop_candidate)
else:
result.append((start_candidate, stop_candidate))
(start_candidate, stop_candidate) = (start, stop)
result.append((start_candidate, stop_candidate))
return result
intervals = [
( 5, 10),
( 8, 12),
(11, 14),
(13, 20),
(22, 25),
(24, 30),
(33, 37),
(36, 40),
]
assert merge_intervals(intervals) == [(5, 20), (22, 30), (33, 40)]
答案 2 :(得分:0)
处理此类数据的最快方法是使用Union find data structure
或disjoint data structure
来跟踪一组元素,这些元素被划分为多个不相交的子集。
我将剩下数据结构的实现和设计留给您,因为有有效的方法来实现线性运行的不相交数据结构。