Question

我有以下格式的元组列表：

[（“ 25.00”，u“ A”），（“ 44.00”，u“ X”），（“ 17.00”，u“ E”），（“ 34.00”，u“ Y”）]

我想计算我们收到每个字母的时间。我已经用所有字母创建了一个排序列表，现在我想对它们进行计数。

首先，我在每个元组的第二项之前遇到u的问题，我不知道如何删除它，我想这与编码有关。

这是我的代码

# coding=utf-8
from collections import Counter 
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
groupes = [] 
students = [] 
group_of_each_letter = [] 
number_of_students_per_group = []
final_list = []

def print_a_list(list):
    for items in list:
        print(items)


for i in df.index:
    groupes.append(df['GROUPE'][i]) 
    students.append(df[u'ÉTUDIANT'][i]) 

groupes = groupes[1:] 
students = students[1:] 

group_of_each_letter = list(set(groupes)) 
group_of_each_letter = sorted(group_of_each_letter) 

z = zip(students, groupes) 
z = list(set(z)) 

final_list = list(zip(*z)) 

for j in group_of_each_letter:
    number_of_students_per_group.append(final_list.count(j))

print_a_list(number_of_students_per_group)

每个字母的分组是一个列表，其中的分组字母没有重复。

问题是我在末尾使用for循环获得了正确数量的值，但列表中填充了'0'。

下面的屏幕截图是excel文件的示例。 “ ETUDIANT”列的意思是“学生编号”，但是我无法编辑文件，必须处理。 GROUPE的意思是GROUP。目标是计算每个小组的学生人数。我认为我有正确的方法，即使有更简单的方法也可以。

即使我知道我的问题有点模棱两可，也要感谢您的帮助

Answer 1

以kerwei的答案为基础：

使用groupby（）然后使用nunique（）

这将为您提供每个组中唯一的学生ID的数量。

public function addRelNoFollow($html, $whiteList = [])
{
    $dom = new \DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
    $a = $dom->getElementsByTagName('a');

    /** @var \DOMElement $anchor */
    foreach ($a as $anchor) {
        $href = $anchor->attributes->getNamedItem('href')->nodeValue;
        $domain = parse_url($href, PHP_URL_HOST);

        // Skip whiteList domains
        if (in_array($domain, $whiteList, true)) {
            continue;
        }

        // Check & get existing rel attribute values
        $noFollow = 'nofollow';
        $rel = $anchor->attributes->getNamedItem('rel');
        if ($rel) {
            $values = explode(' ', $rel->nodeValue);
            if (in_array($noFollow, $values, true)) {
                continue;
            }
            $values[] = $noFollow;
            $newValue = implode($values, ' ');
        } else {
            $newValue = $noFollow;
        }

        // Create new rel attribute
        $rel = $dom->createAttribute('rel');
        $node = $dom->createTextNode($newValue);
        $rel->appendChild($node);
        $anchor->appendChild($rel);
    }

    // There is a problem with saveHTML() and saveXML(), both of them do not work correctly in Unix.
    // They do not save UTF-8 characters correctly when used in Unix, but they work in Windows.
    // So we need to do as follows. @see https://stackoverflow.com/a/20675396/1710782
    return $dom->saveHTML($dom->documentElement);
}

Answer 2

我认为groupby.count()就足够了。它会计算出数据框中GROUPE字母的出现次数。

import pandas as pd

df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
# Drop the empty row, which is actually the subheader
df.drop(0, axis=0, inplace=True)
# Now we get a count of students by group
sub_student_group = df.groupby(['GROUPE','ETUDIANT']).count().reset_index()

>>>sub_student_group
   GROUPE  ETUDIANT
0       B        29
1       L        88
2       N        65
3       O        27
4       O        29
5       O        34
6       O        35
7       O        54
8       O        65
9       O        88
10      O        99
11      O       114
12      O       122
13      O       143
14      O       147
15      U       122

student_group = sub_student_group.groupby('GROUPE').count()

>>>student_group
        ETUDIANT
GROUPE
B              1
L              1
N              1
O             12
U              1

删除Python列表中每个元组的第一项

2 个答案: