我如何处理python中的csv文件中的json数据?

时间:2018-05-27 10:04:40

标签: python json python-3.x csv

我目前正在做一个关于数据分析的项目on the tmdb dataset在kaggle(BTW我是一个完整的菜鸟,所以请原谅我的无知)我在数据集中遇到了以下列(第2列): - `< / p>

流派

[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
[{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
[{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]
[{"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}]
[{"id": 16, "name": "Animation"}, {"id": 10751, "name": "Family"}]

此类型是列名,此列每行中的数据都包含在[]

我想要做的是在以下类型的列中转换它(分隔符可以是除','之外的任何东西,因为它是一个CSV文件: -

genres
Action;Adventure;Fantasy;Fiction;
Adventure;Fantasy;Action;
Action;Adventure;Crime;
And so on...

这是我的代码: -

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv

reviews = pd.read_csv("C:/Users/HP/Desktop/data science project/tmpd second attempt/2. prepared data/tmdb_5000_movies.csv")

print(reviews["genres"]);

PS: - 这是我的第一个项目,所以真的不知道如何处理这个

1 个答案:

答案 0 :(得分:0)

这不是csv文件。这是一个json文件。所以,如果你改变

reviews = pd.read_csv("C:/Users/HP/Desktop/data science project/tmpd second attempt/2. prepared data/tmdb_5000_movies.csv")

reviews = pd.read_json("C:/Users/HP/Desktop/data science project/tmpd second attempt/2. prepared data/tmdb_5000_movies.csv")

它应该有用。

哦,您可能想要将文件扩展名从.csv更改为.json

来自op的评论:

这段代码适用于你的样本,有点黑客,但我能做得最快:

import ast
with open("untitled.csv") as fi:
    data=fi.readlines()
cleaned = []
for da in data:
    l = da.strip("[").strip('\n').strip(']').split(', {')
    for a in l:
        pre=''
        if not a.startswith("{"):
            pre='{'
        cleaned.append(ast.literal_eval(pre+a))
df = pd.DataFrame(cleaned)