Question

我找不到类似的答案来解决这个问题，所以我们开始：我已经爬网了一个网站，并将数据以以下格式存储在CSV文件中：

data = [{"has_media": "false", "tags": ["e", "x", "y", "s", "f", "f"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "name", "text": "This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #ecar", "text_html": "<s>#</s>electric</a>", "timestamp": "2010-08-01T09:37:20", "timestamp_epochs": 3749793792, "_id": "28829932", "_url": "/user/status/17479367564", "user_id": "1038384584", "username": "username", "video_url": ""}, {"has_media": "false", "hashtags": ["e", "o", "y", "p", "r", "s"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "user", "text": "New cooperative project: This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #hello", "text_html": "</a>", "timestamp": "2011-05-01T09:50:11", "timestamp_epochs": 18734839, "_id": "2982892", "_url": "/user/status/83982893", "user_id": "29983882", "username": "user", "video_url": ""}]

我要从中创建一个数据框，其中每个字典是新行，并且每个键代表一列。

我已经尝试过了，但是它只给了我一栏，而且似乎不认识字典。

import pandas as pd file = data.split(',') pd.DataFrame.from_dict(file)

非常感谢您的帮助，因为我是编码的新手，可能需要一些帮助才能使它变得更好。

谢谢！

Answer 1

很简单。您可以使用：

import pandas as pd

data = [{"has_media": "false", "tags": ["e", "x", "y", "s", "f", "f"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "name", "text": "This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #ecar", "text_html": "<p class=\"><s>#</s><b>electric</b></a></p>", "timestamp": "2010-08-01T09:37:20", "timestamp_epochs": 3749793792, "_id": "28829932", "_url": "/user/status/17479367564", "user_id": "1038384584", "username": "username", "video_url": ""}, {"has_media": "false", "hashtags": ["e", "o", "y", "p", "r", "s"], "img_urls": [], "is_replied": "false", "is_reply_to": "false", "likes": 0, "links": [], "parent_id": "", "replies": 0, "reply_to_users": [], "comments": 1, "screen_name": "user", "text": "New cooperative project: This is plain text retrieved from the webpage and a link http://ht.ly/25DQR and a hashtag #hello", "text_html": "<p class=\</b></a></p>", "timestamp": "2011-05-01T09:50:11", "timestamp_epochs": 18734839, "_id": "2982892", "_url": "/user/status/83982893", "user_id": "29983882", "username": "user", "video_url": ""}]

df = pd.DataFrame(data)

顺便说一句，您还可以使用以下方法从文件中下载它：

df = pd.read_json(filename, orient='records')

从字典列表中创建一个数据框，该字典的值具有嵌套列表

1 个答案: