Python3,来自csv文件的字典来计算单词的出现频率

时间:2018-08-24 23:25:49

标签: python python-3.x dictionary word-frequency

我正在尝试编写一个功能,以读取具有不同学位的学生志愿者的CSV文件。该功能的目的是创建一个字典,其中键是度,值是度的频率。

数据的组织方式如下:

name    degree     email

ABC     PhD.       abd@gmail.com
CDE     Ph.D.      cde@gmail.com
FGH     MD,PHD     fgh@gmail.com

旨在获取字典,如下所示:

#degree_count{'phd':3,'md':1}

def degree_frequency(csv_file):
    f = open('csv_file')
    csv_f = csv.reader(f)
    #Creating a list to store all the degrees from the csv file
    student_degree_list=[]
    #Creating an empty dictionary to count the frequency
    degree_count={}
    for row in csv_f:
        student_degree_list.append(row[1]) 
    #Replacing fullstops to account for variations in writing degrees ( eg JD vs J.D)
    [word.replace(".", "") for word in student_degree_list]
    [word.lower() for word in student_degree_list]
    for ele in student_degree_list:
        if ele in degree_count:
            degree_count[ele]=degree_count[ele]+1
        else:
            degree_count[ele]=0
    return degree_count

2 个答案:

答案 0 :(得分:0)

我相信您的问题是,除非您将以下代码分配给变量,否则以下代码无效。

[word.replace(".", "") for word in student_degree_list]
[word.lower() for word in student_degree_list]

而且,如果学位发生1次,是否不应该将其设置为1而不是0?

工作代码:

#degree_count{'phd':3,'md':1}

def degree_frequency():
    f = open('csv_file')
    csv_f = csv.reader(f)
    # Creating a list to store all the degrees from the csv file
    student_degree_list = []
    # Creating an empty dictionary to count the frequency
    degree_count = {}
    for row in csv_f:
        student_degree_list.append(row[1])
    #Replacing fullstops to account for variations in writing degrees ( eg JD vs J.D)
    student_degree_list = [word.replace('.','').lower() for word in student_degree_list]
    for ele in student_degree_list:
        if ele in degree_count:
            degree_count[ele] += 1
        else:
            # Supposed to be 1?
            degree_count[ele]=0
    return degree_count

答案 1 :(得分:0)

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.0.2/jquery.min.js"></script>
<label class="checkbox"> Add Nominee
  <input class="addnominee" type="checkbox">
  <span class="checkmark">Hello</span>
</label>
<label class="checkbox"> Add Nominee
  <input class="addnominee" type="checkbox">
  <span class="checkmark">Hello</span>
</label>
<label class="checkbox"> Add Nominee
  <input class="addnominee" type="checkbox">
  <span class="checkmark">Hello</span>
</label>
<label class="checkbox"> Add Nominee
  <input class="addnominee" type="checkbox">
  <span class="checkmark">Hello</span>
</label>

csv reader code的信用额

byte[] imageArray =  System.IO.File.ReadAllBytes(@"image path");
string base64 = Convert.ToBase64String(imageArray);

选项1

import csv 
from collections import Counter

columns = defaultdict(list) # each value in each column is appended to a list

with open('csv_file.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

选项2

degree_list = columns['degree']
degree_list_clean = []

for cad_degrees in degree_list:
    cad_degrees_lst = cad_degrees.split()
    for degree in cad_degrees_lst:
        degree_clean = degree.strip().replace('.','').lower()
        degree_list_clean.append(degree_clean)

使用熊猫

output_dict_counter_version = dict(Counter(degree_list_clean))
print(output_dict_counter_version)