我有一个以这两行开头的普通csv文件:
HANDLE tProviderPipe = INVALID_HANDLE_VALUE;
SECURITY_ATTRIBUTES tSecurityAttributes;
OVERLAPPED ov = {};
DWORD lLastStatus;
ov.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (!ov.hEvent) {
lLastStatus = GetLastError();
OHTRACE(Trace::eTAlways, L"Pipe Event Create Error: " << (UINT)lLastStatus);
return;
}
tSecurityAttributes.nLength = sizeof(tSecurityAttributes);
tSecurityAttributes.bInheritHandle = FALSE;
tSecurityAttributes.lpSecurityDescriptor = NULL;
while (1) {
if (INVALID_HANDLE_VALUE == tProviderPipe) {
tProviderPipe = ::CreateNamedPipe(L"\\\\.\\pipe\\MyPipe", PIPE_ACCESS_DUPLEX | FILE_FLAG_OVERLAPPED, PIPE_TYPE_BYTE | PIPE_READMODE_BYTE | PIPE_WAIT, 1, 128, 128, 5000, &tSecurityAttributes);
if (INVALID_HANDLE_VALUE == tProviderPipe) {
lLastStatus = GetLastError();
OHTRACE(Trace::eTAlways, L"Pipe Create Error: " << (UINT)lLastStatus);
break;
}
}
if (!ConnectNamedPipe(tProviderPipe, &ov)) {
lLastStatus = GetLastError();
if (ERROR_IO_PENDING == lLastStatus) {
if (WaitForSingleObject(ov.hEvent, 20000) != WAIT_OBJECT_0) {
OHTRACE(Trace::eTAlways, L"Pipe not connected in 20 seconds");
break;
}
}
else if (ERROR_PIPE_CONNECTED != lLastStatus) {
OHTRACE(Trace::eTAlways, L"Pipe Connect Error: " << (UINT)lLastStatus);
continue;
}
}
// use tProviderPipe as needed ...
DisconnectNamedPipe(tProviderPipe);
}
if (INVALID_HANDLE_VALUE != tProviderPipe) {
CloseHandle(tProviderPipe);
}
CloseHandle(ov.hEvent);
总的来说,我有62458行,每行(代表一个房子)有75列(每个房子代表75种类型的家具。我只计算这些数字以便更容易实现,实际上这些逗号之间没有任何内容,所以意味着有充满75种家具的房子,没有)。我想用python写出一个程序来计算每列在csv文件中出现的次数。我尝试了几种方法,但没有成功。
我的输出应该是这样的:
1. Clubhouse,Fibre Ready,Fitness Corner,Aircon,Security,Maid Room,Gym,Tennis Court,Fridge,Bathtub,Swimming Pool,Utility Room,Squash Court,Playground,Oven,Parking,Washer,Pool View,Wading Pool,BBQ Pits,Balcony,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,
2. Clubhouse,Aircon,Greenery View,Fridge,Jacuzzi,Swimming Pool,Dryer,Steam Room,Playground,Fitness Corner,Washer,Gym,Balcony,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,
3. ...
And so on
P / s:我无法将文件导入数据库,因为该文件不可读
答案 0 :(得分:2)
您可以使用collections.Counter
:
from collections import Counter
import csv
counter = Counter()
with open('furniture.csv') as fobj:
reader = csv.reader(fobj)
for row in reader:
counter.update(row)
for k, v in counter.items():
print('{}: {} times'.format(k, v))
两行输出:
Clubhouse: 2 times
Fibre Ready: 1 times
Fitness Corner: 2 times
Aircon: 2 times
...
您还可以访问单个项目::
>>> counter['Clubhouse']
2
>>> counter['Fibre Ready']
1
collections.Counter
对此类任务非常有用:
用于计算可散列物品的Dict子类。有时叫一个包 或多重集。元素存储为字典键及其计数 存储为字典值。