我的文本文件中的数据具有以下格式:
{"date":"Jan 6"; "time":"07:00:01"; "ip":"178.41.163.99"; "user":"null"; "country":"Slovakia"; "city":"Miloslavov"; "lat":48.1059; "lon":17.3}
{"date":"Jan 6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"postgres"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}
{"date":"Jan 6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"null"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}
我需要将其读入pandas DataFrame中,并带有指向列名和项目值的键。这是我的代码,用于读取以下数据:
columns = ['date', 'time', 'ip', 'user', 'country', 'city', 'lat', 'lon']
df = pd.read_csv("log.txt", sep=';', header=None, names=columns)
有点沮丧,因为我设法得到的是这个
date time ... lat lon
0 {"date":"Jan 6" "time":"07:00:01" ... "lat":48.1059 "lon":17.3}
1 {"date":"Jan 6" "time":"07:05:26" ... "lat":57.7072 "lon":11.9668}
2 {"date":"Jan 6" "time":"07:05:26" ... "lat":57.7072 "lon":11.9668}
我从上至下阅读了docs,但仍然无法达到所需的结果,如下所示:
date time ... lat lon
0 Jan 6 07:00:01 ... 48.1059 17.3
1 Jan 6 07:05:26 ... 57.7072 11.9668
2 Jan 6 07:05:26 ... 57.7072 11.9668
有可能吗?任何建议将不胜感激。谢谢。
答案 0 :(得分:2)
如果看起来像在字符串值中没有;
,则可以使用字符串替换将其设置为有效(以行分隔)的json:
In [11]: text
Out[11]: '{"date":"Jan 6"; "time":"07:00:01"; "ip":"178.41.163.99"; "user":"null"; "country":"Slovakia"; "city":"Miloslavov"; "lat":48.1059; "lon":17.3}\n{"date":"Jan 6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"postgres"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}\n{"date":"Jan 6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"null"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}'
In [12]: pd.read_json(text.replace(";", ","), lines=True)
Out[12]:
city country date ip lat lon time user
0 Miloslavov Slovakia Jan 6 178.41.163.99 48.1059 17.3000 07:00:01 null
1 Gothenburg Sweden Jan 6 37.123.163.124 57.7072 11.9668 07:05:26 postgres
2 Gothenburg Sweden Jan 6 37.123.163.124 57.7072 11.9668 07:05:26 null