我已经将模拟退火算法(在Java中)应用到我正在处理的个人项目中,但是想知道如果使用另一种语言(即C ++,Python,SA算法)在同一数据集上的表现稍好一点。等)。
我创建的虚拟数据集包含五千个用户的姓名,邮政编码和出生日期。
SA算法应用于数据集,以便提取许多不同的信息。
目前,我最近的测试是试图让SA算法检测出生日期在一周内(在任何给定年份内)的所有用户。现在,SA算法确实运行良好;然而,由于我是一个完美主义者,我希望获得稍微快一点的结果,并想知道是否有任何人在SA上使用类似大小的数据集产生了出色的结果,但是用其他语言编写了吗?
目前,SA算法只需不到五秒钟即可完成搜索。
答案 0 :(得分:3)
我会用Java编写它
public class UserSearchMain {
static class Person {
int id;
int dateOfBirth;
static Person[] generatePeople(int num) {
Random rand = new Random();
Person[] people = new Person[num];
for (int i = 0; i < num; i++) {
Person p = new Person();
p.id = i;
p.dateOfBirth = rand.nextInt(80 * 365); // any day for 80 years.
people[i] = p;
}
return people;
}
}
public static void main(String... args) {
Person[] people = Person.generatePeople(5000);
List<List<Person>> peopleByDOB = new ArrayList<List<Person>>();
for (Person person : people) {
int dob = person.dateOfBirth;
while (peopleByDOB.size() <= dob) peopleByDOB.add(new ArrayList<Person>());
peopleByDOB.get(dob).add(person);
}
Random rand = new Random();
int warmup = 12 * 1000;
int runs = 1000 * 1000;
long start = 0;
for (int i = -warmup; i < runs; i++) {
if (i == 0) start = System.nanoTime();
int dayToSearch = rand.nextInt(80 * 365);
for (int j = Math.max(0, dayToSearch - 7); j <= dayToSearch + 7 && j < peopleByDOB.size(); j++) {
List<Person> peopleWithCloseDOM = peopleByDOB.get(j);
}
}
long time = System.nanoTime() - start;
System.out.printf("Each search took an average of %,d ns%n", time / runs);
}
}
打印
Each search took an average of 85 ns
您使用的算法通常比选择语言更重要。
编辑:回答你原来的问题,你能用同样的算法在C ++中加快这个速度吗?我猜是的,但不是很多。
为了进一步加快速度,您可以使用多个线程。
public static void main(String... args) throws ExecutionException, InterruptedException {
Person[] people = Person.generatePeople(5000);
final List<List<Person>> peopleByDOB = new ArrayList<List<Person>>();
for (Person person : people) {
int dob = person.dateOfBirth;
while (peopleByDOB.size() <= dob) peopleByDOB.add(new ArrayList<Person>());
peopleByDOB.get(dob).add(person);
}
final int runs = 10 * 1000 * 1000;
long start = System.nanoTime();
int processors = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(processors);
List<Future> futures = new ArrayList<Future>();
for (int i = 0; i < processors; i++) {
futures.add(service.submit(new Runnable() {
@Override
public void run() {
Random rand = new Random();
for (int i = 0; i < runs; i++) {
int dayToSearch = rand.nextInt(80 * 365);
for (int j = Math.max(0, dayToSearch - 7); j <= dayToSearch + 7 && j < peopleByDOB.size(); j++) {
List<Person> peopleWithCloseDOM = peopleByDOB.get(j);
}
}
}
}));
}
for (Future future : futures) {
future.get();
}
service.shutdown();
double timeSec = (System.nanoTime() - start) / 1e9;
double rate = processors * runs / timeSec / 1e6;
System.out.printf("The search throughput was %.1f million per second%n", rate);
}
打印
The search throughput was 32.4 million per second
表示每次搜索平均约31 ns。