应用错误收集

我正在编写一个脚本，将apache日志文件解析为pandas表。现在我认识到，在日志文件中，文本有时像这样/其中包含utf8值：

Bern,%20Bahnhof

例如一个日志文件文本行：

IP - - timestamp "GET /v1/connections?from=Bern,%20Bahnhof&to=Luzern HTTP/1.1" httpstatus bytes -"

我当前打开日志文件的代码：

cols = ['ip','l','userid','timestamp','tz','fullrequest','status','bytes','referer','useragent']

df = pd.read_csv(path + file, delim_whitespace=True, names=cols, error_bad_lines=False, encoding='utf8').drop('l', 1)
df = df.drop('userid', 1)

有没有办法将日志文件解析为pandas，以便将这些奇怪的字符转换为拉丁字符？

所以最后我们有这样的事情：

Bern, Bahnhof

如何将文本文件读入pandas并将utf8 chars转换为string / latin？

0 个答案: