Question

我有一个文件“ LMD.rh.arff”，我正在尝试使用以下代码将其转换为.csv文件-

import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import arff


# Read in .arff file-
data = arff.loadarff("LMD.rh.arff")

但是这最后一行代码给了我错误-

-------------------------------------------------- ---------------------------- UnicodeEncodeError追溯（最近的呼叫   最后）   ----> 1个数据= arff.loadarff（“ LMD.rp.arff”）

〜/ .local / lib / python3.6 / site-packages / scipy / io / arff / arffread.py在   装货量（f）       539 ofile = open（f，'rt'）       540尝试：   -> 541返回_loadarff（ofile）       542最后：       543如果ofile不是f：＃仅关闭打开的内容

〜/ .local / lib / python3.6 / site-packages / scipy / io / arff / arffread.py在   _loadarff（ofile）       第627章死了       628＃这里不应该发生错误：否则就是错误   -> 629数据= np.fromiter（a，descr）       630返回数据，元       631

UnicodeEncodeError：'ascii'编解码器无法在其中编码字符'\ xf3'   位置4：序数不在范围内（128）

在[6]中：data = arff.loadarff（“ LMD.rh.arff”）

-------------------------------------------------- ---------------------------- UnicodeEncodeError追溯（最近的呼叫   最后）   ----> 1个数据= arff.loadarff（“ LMD.rh.arff”）

〜/ .local / lib / python3.6 / site-packages / scipy / io / arff / arffread.py在   装甲（f）       539 ofile = open（f，'rt'）       540尝试：   -> 541返回_loadarff（ofile）       542最后：       543如果ofile不是f：＃仅关闭打开的内容

〜/ .local / lib / python3.6 / site-packages / scipy / io / arff / arffread.py在   _loadarff（ofile）       第627章死了       628＃这里不应该发生错误：否则就是错误   -> 629数据= np.fromiter（a，descr）       630返回数据，元       631

UnicodeEncodeError：'ascii'编解码器无法在其中编码字符'\ xf3'   位置4：序数不在范围内（128）

您可以下载文件arff_file

关于出了什么问题的任何想法吗？

谢谢！

Answer 1

尝试一下

path_to_directory="./"
files = [arff for arff in os.listdir(path_to_directory) if arff.endswith(".arff")]

def toCsv(content): 
    data = False
    header = ""
    newContent = []
    for line in content:
        if not data:
            if "@attribute" in line:
                attri = line.split()
                columnName = attri[attri.index("@attribute")+1]
                header = header + columnName + ","
            elif "@data" in line:
                data = True
                header = header[:-1]
                header += '\n'
                newContent.append(header)
        else:
            newContent.append(line)
    return newContent

# Main loop for reading and writing files
for zzzz,file in enumerate(files):
    with open(path_to_directory+file , "r") as inFile:
        content = inFile.readlines()
        name,ext = os.path.splitext(inFile.name)
        new = toCsv(content)
        with open(name+".csv", "w") as outFile:
            outFile.writelines(new)

Answer 2

看看错误跟踪

UnicodeEncodeError：'ascii'编解码器无法在位置4编码字符'\ xf3'：序数不在范围内（128）

您的错误表明您的文件存在编码问题。考虑首先使用正确的编码打开文件，然后将其加载到arff加载器

import codecs
import arff

file_ = codecs.load('LMD.rh.arff', 'rb', 'utf-8') # or whatever encoding you have 
arff.load(file_) # now this should be fine

有关参考，请参见here

使用Python将.arff文件转换为.csv

2 个答案: