我有一个CSV格式的非常大的数据集,其中一列是JSON字符串。我想将此信息读入 flat Pandas数据框。我怎样才能有效地实现这一目标?
输入CSV:
col1,col2,col3,col4
1,Programming,"{""col3_1"":null,""col3_2"":""Java""}",11
2,Sport,"{""col3_1"":null,""col3_2"":""Soccer""}",22
3,Food,"{""col3_1"":null,""col3_2"":""Pizza""}",33
预期的DataFrame:
+---------------------------------------------------------------+
| col1 | col2 | col3_1 | col3_2 | col4 |
+---------------------------------------------------------------+
| 1 | Programming | None | Java | 11 |
| 2 | Sport | None | Soccer | 22 |
| 3 | Food | None | Pizza | 33 |
+---------------------------------------------------------------+
我目前可以使用以下代码获得预期的输出。我只是想知道是否有更有效的方法来实现同样的目标。
import json
import pandas
dataset = pandas.read_csv('/dataset.csv')
dataset['col3'] = dataset['col3'].apply(json.loads)
dataset['col3_1'] = dataset['col3'].apply(lambda row: row['col3_1'])
dataset['col3_2'] = dataset['col3'].apply(lambda row: row['col3_2'])
dataset = dataset.drop(columns=['col3'])
答案 0 :(得分:4)
您可以使用{{1}}解析Pandas列中的JSON,并使用{{1}}将其转换为Pandas列:
{{1}}
答案 1 :(得分:3)
使用DataFrame
构造函数df1 = pd.DataFrame(df.pop('col3').apply(pd.io.json.loads).values.tolist(), index=df.index)
df = df.join(df1)
print (df)
col1 col2 col4 col3_1 col3_2
0 1 Programming 11 None Java
1 2 Sport 22 None Soccer
2 3 Food 33 None Pizza
来获取提取列{/ 3}}:
print (df.pop('col3').apply(pd.io.json.loads))
0 {'col3_1': None, 'col3_2': 'Java'}
1 {'col3_1': None, 'col3_2': 'Soccer'}
2 {'col3_1': None, 'col3_2': 'Pizza'}
Name: col3, dtype: object
print (pd.DataFrame(df.pop('col3').apply(pd.io.json.loads).values.tolist(), index=df.index))
col3_1 col3_2
0 None Java
1 None Soccer
2 None Pizza
<强>详细强>:
df = pd.concat([df] * 10000, ignore_index=True)
In [204]: %timeit df.join(pd.DataFrame(df['col3'].apply(pd.io.json.loads).values.tolist(), index=df.index))
10 loops, best of 3: 76.4 ms per loop
In [205]: %timeit df.join(df['col3'].apply(lambda x: pd.Series(json.loads(x))))
1 loop, best of 3: 11.3 s per loop
解决方案类似,但性能不同:
void TMooseEngine::toggleFullscreen()
{
_fullscreen = !_fullscreen;
glfwDestroyWindow(window);
delete _shader;
delete _skybox;
//delete _particulas;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
if(_fullscreen){ //change to fullscreen
window = glfwCreateWindow(_width, _height, "Fate Warriors", glfwGetPrimaryMonitor(), NULL);
glfwMakeContextCurrent(window);
glViewport(0,0,_width,_height);
//culling
glEnable(GL_DEPTH_TEST);
glViewport(0,0,_width,_height);
glEnable(GL_CULL_FACE);
glCullFace(GL_BACK);
glFrontFace(GL_CCW);
_shader = new Shader();
_skybox = new Skybox();
initUI();
}
else{ //change to windowed
window = glfwCreateWindow(_width, _height, "Fate Warriors", NULL, NULL);
glfwMakeContextCurrent(window);
glViewport(0,0,_width,_height);
//culling
glEnable(GL_DEPTH_TEST);
glViewport(0,0,_width,_height);
glEnable(GL_CULL_FACE);
glCullFace(GL_BACK);
glFrontFace(GL_CCW);
_shader = new Shader();
_skybox = new Skybox();
initUI();
}
}