我有一个CSV文件,该文件有2列,每行中的第一项是个人ID,另一项是他的朋友ID。 我想按人的ID对他们的朋友ID进行分组,然后按朋友计数降序进行排序;我该怎么办?
我的CSV文件是这样的:
ID,Friend_ID
P0,P1
P0,P2
P0,P3
P1,P0
P1,P2
P1,P3
P2,P0
P3,P0
P3,P1
我想要这个:
ID,Frind_count
P0,3
P1,3
P3,2
P2,1
答案 0 :(得分:0)
您需要将主要问题分为子问题。根据您的要求,必须执行以下步骤:
所有这一切都可以用大约25行代码来完成。功能主线由〜12行组成。请参阅以下一种可能的解决方案。此解决方案利用了std库。
#include <map>
#include <string>
#include <iostream>
#include <fstream>
#include <utility>
#include <iterator>
#include <regex>
#include <vector>
// Make reading easier
using PairOfFriends = std::pair<const std::string, std::string>; // This is exactly one pair of friends. With a const member for multimap "type_value"
using MmFriend = std::multimap<std::string, std::string>; // And this are all pairs of frineds in a sorted multimap
struct PairLine // ! This is a proxy for the input_iterator !
{ // Input function. Read on line of text file and split it into 2 parts
friend std::istream& operator>>(std::istream& is, PairLine& line)
{ // We will use a regex iterator to be a little more tolerant for different separator characters
std::string wholeLine; std::regex comma("[ \t\n]*[,;.][ \t\n]*");
std::getline(is, wholeLine); // Read one complete line and split it into 2 parts
std::vector<std::string> part{ std::sregex_token_iterator(wholeLine.begin(), wholeLine.end(), comma, -1), std::sregex_token_iterator() };
if (2 == part.size()) line.pairString = std::make_pair(part[0], part[1]); // Copy the 2 parts in our internal variable
return is;
}
operator PairOfFriends() const { return pairString; } // Cast pairString to PairOfFriends. For input iterator.
std::pair<std::string, std::string> pairString{}; // Internal representation. Attention! No const member!
};
struct PersonCount
{ // We will use this POD to store person names and related counts
std::string person; unsigned int personCount; // And we need also an output function
friend std::ostream& operator<< (std::ostream& os, const PersonCount& pc) { return (os << pc.person << "," << pc.personCount); }
};
int main()
{
// 1. Open file. Will be closed by destructor ------------------------------
std::ifstream inFileStream{ "r:\\person.csv" };
// 2. Read header ----------------------------------------------------------
std::string header; std::getline(inFileStream, header);
// 3. Read the complete CSV file and put result into multimap --------------
MmFriend mmFriend{ std::istream_iterator<PairLine>(inFileStream), std::istream_iterator<PairLine>() };
// 4. Count how many friends persons have ----------------------------------
std::vector<PersonCount> friendCount{}; // Person and Count will be stored here
for (MmFriend::iterator it = mmFriend.begin(), end = mmFriend.end(); it != end; it = mmFriend.upper_bound(it->first)) {
friendCount.emplace_back(PersonCount{it->first, mmFriend.count(it->first) }); // Put count info into new arry
}
//5. Sort ------------------------------------------------------------------
std::sort(friendCount.begin(), friendCount.end(), [](const PersonCount & p1, const PersonCount & p2) { return p2.personCount < p1.personCount; });
// 6. Output ---------------------------------------------------------------
std::ofstream outFileStream{"r:\\out.txt"};
outFileStream << "ID,Friend_count\n";
std::copy(friendCount.begin(), friendCount.end(), std::ostream_iterator<PersonCount>(outFileStream, "\n"));
return 0;
}
请注意:为了读取CSV数据并进行解析,我们使用了输入迭代器。整个解析是在1行中完成的。提取器功能使用sregex_token_iterator解析输入行。如果应将其扩展为两个以上的条目,则可以使用元组代替标准对。
剩下的事情真的很简单。我想,它不需要更多解释。
但是。如有必要,我当然会回答问题。
希望这会有所帮助