我必须读取并计数.csv文件中的特定列(在我的文件中为第6列),但是ı不能,我写的内容也适用于小型文件,它读取文件中的每个单词并打印,但是当它涉及到大文件(在我的项目ı中有一个包含100万条推文的.csv文件)控制台窗口出现,并且我的CPU速度达到35-40(有时更高),但无法打印数据。我等了1个小时但是后来我关闭了它,因为我应该在5分钟左右完成这项工作。我有一些停用词,而且我不知道如何忽略它们。我知道我有很多问题,但是我是新来的
#include "pch.h"
#include <iostream>
#include <string>
#include <cstring>
#include <fstream>
#include <cctype>
#include <chrono>
const int MAX = 1000000;
std::string words[MAX];
int instances[MAX];
int count =0;
void insert(std::string input) {
for (int i = 0; i < count; i++)
if (input == words[i]) {
instances[i]++;
return;
}
if (count < MAX) {
words[count] = input;
instances[count] = 1;
count++;
}
}
int findTop(std::string &word) {
int topCount = instances[0];
int topIndex = 0;
for (int i=1; i<count; i++)
if (instances[i] > topCount) {
topCount = instances[i];
topIndex = i;
}
instances[topIndex] = 0;
word = words[topIndex];
return topCount;
}
int main() {
using clock = std::chrono::system_clock;
using s = std::chrono::seconds;
const auto before = clock::now();
std::string word;
std::ifstream data("eray.csv");
while (data >> word)
insert(word);
data.close();
int topCount = 0;
for (int i = 0; i < 10 ; i++) {
std::cout << word << " " << findTop(word) << std::endl;
}
const auto duration = std::chrono::duration_cast<s>(clock::now() - before);
std::cout << "\nTotal Elapsed Time : " << duration.count() << "s" << std::endl;
}