我有一个字符串列表,我需要计算其中包含特定字符串的列表条目的数量(并且整个事项仅用于列表的子集而不是整个列表)。
下面的代码工作得很好但是它的性能是......遗憾的是不能在可接受的niveau中,因为我需要解析500k到900k的列表条目。对于这些条目,我需要运行下面的代码大约10k次(因为我需要分析列表的10k部分)。为此需要177秒甚至更多。所以我的问题是我怎么能这样做......快?
private int ExtraktNumbers(List<string> myList, int start, int end)
{
return myList.Where((x, index) => index >= start && index <= end
&& x.Contains("MYNUMBER:")).Count();
}
答案 0 :(得分:3)
现在我们知道你在这里调用方法10,00次是我的建议。我假设你有硬编码&#34;数字:&#34;这意味着你每次通话都在做不同的范围?所以如果是这样的话......
首先,运行索引&#39;方法并创建哪些索引匹配的列表。然后,您可以轻松计算所需范围的匹配。
注意:这很快,您甚至可以进一步优化它:
List<int> matchIndex = new List<int>();
void RunIndex(List<string> myList)
{
for(int i = 0; i < myList.Count; i++)
{
if(myList[i].Contains("MYNUMBER:"))
{
matchIndex.Add(i);
}
}
}
int CountForRange(int start, int end)
{
return matchIndex.Count(x => x >= start && x <= end);
}
然后您可以像这样使用,例如:
RunIndex(myList);
// I don't know what code you have here, this is just basic example.
for(int i = 0; i <= 10,000; i++)
{
int count = CountForRange(startOfRange, endOfRange);
// Do something with count.
}
此外,如果你检查的范围中有很多重复,那么你可以考虑在字典中缓存范围计数,但是在这个阶段很难判断这是否值得做。< / p>
答案 1 :(得分:2)
我很确定一个简单的迭代解决方案会表现得更好:
private int ExtractNumbers(List<string> myList, int start, int end)
{
int count = 0;
for (int i = start; i <= end; i++)
{
if (myList[i].Contains("MYNUMBER:"))
{
count++;
}
}
return count;
}
答案 2 :(得分:1)
我的测试支持 10百万( 10倍于)行
var data = Enumerable
.Range(1, 10000000)
.Select(item => "123456789 bla-bla-bla " + "MYNUMBER:" + item.ToString())
.ToList();
Stopwatch sw = new Stopwatch();
sw.Start();
int result = ExtraktNumbers(data, 0, 10000000);
sw.Stop();
我得到了这些结果:
2.78 秒 - 您最初的实施
天真循环( 2.60 秒):
private int ExtraktNumbers(List<string> myList, int start, int end) {
int result = 0;
for (int i = start; i < end; ++i)
if (myList[i].Contains("MYNUMBER:"))
result += 1;
return result;
}
PLinq( 1.72 秒):
private int ExtraktNumbers(List<string> myList, int start, int end) {
return myList
.AsParallel() // <- Do it in parallel
.Skip(start - 1)
.Take(end - start)
.Where(x => x.Contains("MYNUMBER:"))
.Count();
}
明确的并行实现( 1.66 秒):
private int ExtraktNumbers(List<string> myList, int start, int end) {
long result = 0;
Parallel.For(start, end, (i) => {
if (myList[i].Contains("MYNUMBER:"))
Interlocked.Increment(ref result);
});
return (int) result;
}
我无法重现 177 秒
答案 3 :(得分:0)
如果你从一开始就知道你想要考虑的间隔,那么循环列表可能是个好主意,就像上面提到的Dmytro和musefan所做的那样,所以我不再重复同样的想法了。
但是我对性能改进有不同的建议。你如何创建你的清单?你知道提前的物品数量吗?因为对于这么大的列表,您可以使用"""
Django settings for mysite project.
Generated by 'django-admin startproject' using Django 1.8.4.
For more information on this file, see
https://docs.djangoproject.com/en/1.8/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/1.8/ref/settings/
"""
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
import os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/1.8/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'n^63(%(va-3wb9l!!2-vg003f)s(3g=%w1*%tv2(8%l)65g&a2'
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
ALLOWED_HOSTS = []
# Application definition
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'QNA',
'BTS',
'accounts',
'widget_tweaks',
)
MIDDLEWARE_CLASSES = (
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'django.middleware.security.SecurityMiddleware',
)
ROOT_URLCONF = 'mysite.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'mysite.wsgi.application'
# Database
# https://docs.djangoproject.com/en/1.8/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
}
}
# Internationalization
# https://docs.djangoproject.com/en/1.8/topics/i18n/
LANGUAGE_CODE = 'ko-kr'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.8/howto/static-files/
STATIC_URL = '/static/'
constructor that takes the initial capacity来提升性能。