包含微平均和宏观平均F1分数的表

时间:2019-04-30 06:50:02

标签: python python-3.x

我有一个包含5个不同文件夹的文件夹,其中每个文件夹有50个属于特定主题的电子邮件文档(因此,总共有5个主题/类)。

训练两个分类器-决策树和SVC(带有线性核)。报告10倍交叉验证的微观平均和宏观平均F1得分。您可能需要预处理数据,修剪决策树并为SVC找到合适的C值

您能为我提供包含微观平均和宏观平均F1得分的表格吗?

我尝试将每个文件夹的邮件放在一个txt文件中,但是当我执行决策树时,仍然不允许我这样做。

无法获得结果。

我应该将所有文件夹中的文件放入一个文本文件吗?

with open ("C:/Users/*******/DS Assign/toclassify/cwx.txt", "w") as outfile:
    for f in files:
        with open(f) as infile:
            for line in infile:
                outfile.write(line)

path = ("C:/Users/*******/DS Assign/toclassify/ra/*")
files = glob.glob(path)

#print(files)

with open ("C:/Users/*******/DS Assign/toclassify/ra.txt", "w") as outfile:
    for f in files:
        with open(f) as infile:
            for line in infile:
                outfile.write(line)

path = ("C:/Users/*******/DS Assign/toclassify/rsh/*")
files = glob.glob(path)

#print(files)

with open ("C:/Users/*******/DS Assign/toclassify/rsh.txt", "w") as outfile:
    for f in files:
        with open(f) as infile:
            for line in infile:
                outfile.write(line)


path = ("C:/Users/*******/DS Assign/toclassify/src/*")
files = glob.glob(path)

#print(files)

with open ("C:/Users/*******/DS Assign/toclassify/src.txt", "w") as outfile:
    for f in files:
        with open(f) as infile:
            for line in infile:
                outfile.write(line)

path = ("C:/Users/*******/DS Assign/toclassify/tpm/*")
    files = glob.glob(path)

    #print(files)

1 个答案:

答案 0 :(得分:0)

import os
import pandas as pd

data_dir = os.path.join('.', 'data')

data_ids = []
data_txt = []

# Create a helper function to read the data from a  particular folder and file
def get_data(file_name, folder_dir):
    file_path = os.path.join(folder_dir, file_name)
    return open(file_path, 'r').read()

# Loop through each folder in the data directory
for folder in os.listdir(data_dir):
    # Create the folder directory from the data directory
    folder_dir = os.path.join(data_dir, folder)

    # Store the IDs of each file in the particular folder directory into a list
    data_ids += os.listdir(folder_dir)

    # Using list comprehension to create a list of the text contained in each file
    # for a particular ID in the folder directory
    data_txt += [get_data(data_id, folder_dir) for data_id in os.listdir(folder_dir)]

# Store into a Pandas dataframe for easy integration into modelling packages

df = pd.DataFrame({
    'id': data_ids,
    'text': data_txt
})