将多个值读入pandas DataFrame

时间:2019-02-07 23:54:36

标签: python-3.x pandas dataframe

我的文本文件中的数据具有以下格式:

{"date":"Jan  6"; "time":"07:00:01"; "ip":"178.41.163.99"; "user":"null"; "country":"Slovakia"; "city":"Miloslavov"; "lat":48.1059; "lon":17.3}
{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"postgres"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}
{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"null"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}

我需要将其读入pandas DataFrame中,并带有指向列名和项目值的键。这是我的代码,用于读取以下数据:

columns = ['date', 'time', 'ip', 'user', 'country', 'city', 'lat', 'lon']
df = pd.read_csv("log.txt", sep=';', header=None, names=columns)

有点沮丧,因为我设法得到的是这个

               date                time  ...             lat              lon
0  {"date":"Jan  6"   "time":"07:00:01"  ...   "lat":48.1059      "lon":17.3}
1  {"date":"Jan  6"   "time":"07:05:26"  ...   "lat":57.7072   "lon":11.9668}
2  {"date":"Jan  6"   "time":"07:05:26"  ...   "lat":57.7072   "lon":11.9668}

我从上至下阅读了docs,但仍然无法达到所需的结果,如下所示:

     date       time  ...       lat       lon
0  Jan  6   07:00:01  ...   48.1059      17.3
1  Jan  6   07:05:26  ...   57.7072   11.9668
2  Jan  6   07:05:26  ...   57.7072   11.9668

有可能吗?任何建议将不胜感激。谢谢。

1 个答案:

答案 0 :(得分:2)

如果看起来像在字符串值中没有;,则可以使用字符串替换将其设置为有效(以行分隔)的json:

In [11]: text
Out[11]: '{"date":"Jan  6"; "time":"07:00:01"; "ip":"178.41.163.99"; "user":"null"; "country":"Slovakia"; "city":"Miloslavov"; "lat":48.1059; "lon":17.3}\n{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"postgres"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}\n{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"null"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}'

In [12]: pd.read_json(text.replace(";", ","), lines=True)
Out[12]:
         city   country    date              ip      lat      lon      time      user
0  Miloslavov  Slovakia  Jan  6   178.41.163.99  48.1059  17.3000  07:00:01      null
1  Gothenburg    Sweden  Jan  6  37.123.163.124  57.7072  11.9668  07:05:26  postgres
2  Gothenburg    Sweden  Jan  6  37.123.163.124  57.7072  11.9668  07:05:26      null