在同一位置有2个csv文件: 1-候选人.csv 2- Store.csv
当我在使用此代码的同时导入候选人.csv filw时,它将被导入:
data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\candidates.csv")
但是当我使用相同的代码导入Store.csv文件时,出现错误:
data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv")
错误:
UnicodeDecodeError跟踪(最近的调用) 最后)pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._convert_tokens()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._convert_with_dtype()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._string_convert()
pandas._libs.parsers._string_box_utf8()中的pandas_libs \ parsers.pyx
UnicodeDecodeError:“ utf-8”编解码器无法解码位置中的字节0xf6 9:无效的起始字节
在处理上述异常期间,发生了另一个异常:
UnicodeDecodeError跟踪(最近的调用) 最后) ----> 1个data = pandas.read_csv(“ C:\ Users \ Nupur \ Desktop \ Ankit \ Store.csv”)
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ io \ parsers.py在 parser_f(filepath_or_buffer,sep,分隔符,标头,名称,index_col, usecols,squeeze,前缀,mangle_dupe_cols,dtype,引擎,转换器, true_values,false_values,skipinitialspace,skiprows,nrows, na_values,keep_default_na,na_filter,详细,skip_blank_lines, parse_dates,infer_datetime_format,keep_date_col,date_parser, dayfirst,迭代器,chunksize,压缩,数千,十进制, 换行符,quotechar,引用,escapechar,注释,编码, 方言,tupleize_cols,error_bad_lines,warn_bad_lines,skipfooter, 双引号,delim_whitespace,low_memory,memory_map, float_precision) 676 skip_blank_lines = skip_blank_lines) 677 -> 678 return _read(filepath_or_buffer,kwds) 679 680 parser_f。名称 =名称
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ io \ parsers.py在 _read(filepath_or_buffer,kwds) 444 445尝试: -> 446数据= parser.read(行) 447终于: 448 parser.close()
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ io \ parsers.py在 读取(自己,向上)1034引发ValueError('skipfooter 不支持迭代')1035 -> 1036 ret = self._engine.read(nrows)1037 1038#可能会更改列/ col_dict
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pandas \ io \ parsers.py在 read(self,nrows)1846 def read(self,nrows = None):1847
尝试: -> 1848数据= self._reader.read(nrows)1849,但StopIteration除外:如果self._first_chunk为1850:pandas._libs.parsers.TextReader.read()中的pandas_libs \ parsers.pyx
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._read_low_memory()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._read_rows()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._convert_column_data()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._convert_tokens()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._convert_with_dtype()
pandas_libs \ parsers.pyx在 pandas._libs.parsers.TextReader._string_convert()
pandas._libs.parsers._string_box_utf8()中的pandas_libs \ parsers.pyx
UnicodeDecodeError:“ utf-8”编解码器无法解码位置中的字节0xf6 9:无效的起始字节
答案 0 :(得分:1)
尝试使用它,
data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv",encoding = "ISO-8859-1")
答案 1 :(得分:1)
如果由于文件上的编码不是pd.read_csv()
文档中提到的默认编码而导致编码错误,则可以先安装chardet
,然后再执行以下操作,找到文件的编码代码:
import chardet
rawdata = open('D:\\path\\file.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)
这将为您提供文件的编码。
一旦有了编码,就可以读为:
pd.read_csv('D:\\path\\file.csv',encoding = 'encoding you found')
或
pd.read_csv(r'D:\path\file.csv',encoding = 'encoding you found')
您将获得所有编码here的列表
希望您觉得这有用。
答案 2 :(得分:0)
您尝试过
#include "utilities.h"
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <syslog.h>
#include <sys/time.h>//getrlimit
#include <sys/resource.h>//getrlimit
#include <signal.h> //sigempyset , asigcation (umask?)
#include <sys/resource.h>
#include <fcntl.h> //O_RDWR
#include <stdarg.h>
#include "error.h"
/*The function creates a daemon*/
int daemonize(const char *cmd)
{
int fd0, fd1, fd2;
unsigned int i;
pid_t pid;
struct rlimit rl;
struct sigaction sa;
/* Clear file creation mask.*/
umask(0);
/* Get maximum number of file descriptors. */
if (getrlimit(RLIMIT_NOFILE, &rl) < 0)
{
err_quit("%s: can’t get file limit", cmd);
}
/* Become a session leader to lose controlling TTY. */
if ((pid = fork()) < 0)
{
err_quit("%s: can’t fork", cmd);
}
else if (pid != 0) /* parent */
{
exit(0); //the parent will exit
}
setsid();
/* Ensure future opens won’t allocate controlling TTYs. */
sa.sa_handler = SIG_IGN;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGHUP, &sa, NULL) < 0)
{
err_quit("%s: can’t ignore SIGHUP", cmd);
}
if ((pid = fork()) < 0)
{
err_quit("%s: can’t fork", cmd);
}
else if (pid != 0) /* parent */
{
exit(0);
}
/*
* Change the current working directory to the root so
* we won’t prevent file systems from being unmounted.
*/
if (chdir("/") < 0)
{
err_quit("%s: can’t change directory to /", cmd);
}
/* Close all open file descriptors. */
if (rl.rlim_max == RLIM_INFINITY)
{
rl.rlim_max = 1024;
}
printf("closing file descriptors\n");
for (i = 0; i < rl.rlim_max; i++)
{
close(i);
}
/* Attach file descriptors 0, 1, and 2 to /dev/null.*/
//printf not working
/*printf("closed all file descriptors for daemonizing\n");*/
fd0 = open("/dev/null", O_RDWR);
fd1 = dup(0);
fd2 = dup(0);
/* Initialize the log file. Daemons do not have a controlling terminal so
they can't write to stderror. We don't want them to write to the console device
because on many workstations the control device runs a windowing system. They can't
write on separate files either. A central daemon error-logging facility is required.
This is the BSD. 3 ways to generate log messages:
1) kernel routines call the log function. These messages can be read from /dev/klog
2) Most user processes (daemons) call syslog to generate log messages. This causes
messages to be sent to the UNIX domain datagram socket /dev/log
3) A user process on this host or on other host connected to this with TCP/ID
can send log messages to UDP port 514. Explicit network programmin is required
(it is not managed by syslog.
The syslogd daemon reads al three of log messages.
openlog is optional since if not called, syslog calls it. Also closelog is optional
openlog(const char *ident, int option, int facility)
It lets us specify ident that is added to each logmessage. option is a bitmask:
LOG_CONS tells that if the log message can't be sent to syslogd via UNIX
domain datagram, the message is written to the console instead.
facility lets the configuration file specify that messages from different
facilities are to be handled differently. It can be specified also in the 'priority'
argument of syslog. LOG_DAEMON is for system deamons
*/
openlog(cmd, LOG_CONS, LOG_DAEMON);
if (fd0 != 0 || fd1 != 1 || fd2 != 2)
{
/*This generates a log mesage.
syslog(int priority, const char *fformat,...)
priority is a combination of facility and level. Levels are ordered from highest to lowest:
LOG_EMERG: emergency system unusable
LOG_ALERT: condiotin that must be fied immediately
LOG_CRIT: critical condition
LOG_ERR: error condition
LOG_WARNING
LOG_NOTICE
LOG_INFO
LOG_DEBUG
format and other arguements are passed to vsprintf function forf formatting.*/
syslog(LOG_ERR, "unexpected file descriptors %d %d %d", fd0, fd1, fd2);
exit(1);
}
return 0;
}
/*The function set the FD_CLOEXEC flag of the file descriptor already open that
is passed to as parameter. FD_CLOEXEC causes the file descriptor to be
automatically and atomically closed when any of the exec family function is
called*/
int set_cloexec(int fd)
{
int val;
/* retrieve the flags of the file descriptor */
if((val = fcntl(fd, F_GETFD, 0))<0)
{
return -1;
}
/* set the FD_CLOEXEC file descriptor flag */
/*it causes the file descriptor to be automatically and atomically closed
when any of the exec family function is called*/
val |= FD_CLOEXEC;
return (fcntl(fd, F_SETFD, val));
}
如果上述方法不起作用,则表明您的编码格式不同,我建议为Windows选择少量编码,例如data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv", encoding='utf-8')
,encoding='iso-8859-1'
或encoding='cp1252'
。
或者尝试在文件名前面添加encoding='latin1'
,以便将其视为“ r
”,这样反斜杠就不会被特殊对待:
raw string