我有一个数据库,其中有一个列,我想从中获取一些信息。我需要创建一个新数据库(我称其为“ df_topic”),在该数据库中,我从“ df”数据库的“ board_data”列中收集了“ topics”和“ total”。
我尝试了一些代码,但是遇到了一个我不知道如何解决的错误。
这是数据库的示例:
services.AddDataProtection()
.SetApplicationName("my-mvc-app") //required so that all container apps have the same name
.PersistKeysToAwsS3(
new AmazonS3Client(RegionEndpoint.USEast1), //I had to specify the endpoint or else I got an exception
new S3XmlRepositoryConfig("my-mvc-app-data-protection-keys") //the name of the bucket you create in S3
{
KeyPrefix = "DataProtectionKeys/", //optional KeyPrefix (i.e. subfolder in S3)
});
这是预期的结果:
df = [{"username": "last",
"board_data": "{\"boards\":[{\"postCount\":\"75\",\"topicCount\":\"5\",\"name\":\"Hardware\",\"url\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount\":\"26\",\"topicCount\":\"1\",\"name\":\"Atari 5200\",\"url\",\"totalCount\":27},{\"postCount\":\"9\",\"topicCount\":0,\"name\":\"Atari 8\",\"url\"\"totalCount\":9}"
},
{"username": "truk",
"board_data": "{\"boards\":[{\"postCount\":\"351\",\"topicCount\":\"11\",\"name\":\"Atari 2600\",\"url\",\"totalCount\":362},{\"postCount\":\"333\",\"topicCount\":\"22\",\"name\":\"Hardware\",\"url\",\"totalCount\":355},{\"postCount\":\"194\",\"topicCount\":\"8\",\"name\":\"Marketplace\",\"url\",\"totalCount\":202}"
}]
df = pd.DataFrame(df)
df
这是我正在使用的代码,但是存在TypeError:
username topic total
0 last Hardware 80
1 last Marketplace 31
2 last Atari 5200 27
3 last Atari 8 9
4 truk Atari 2600 362
5 truk Hardware 355
6 truk Marketplace 202
这是我得到的错误:
TypeError:字符串索引必须为整数
答案 0 :(得分:0)
您的错误来自尝试像对待dict
一样操作字符串对象。如果您使用的是Pandas的.loc
或.iloc
语法来索引/切片[docs],这将更加清楚。
我建议备份并解决问题的根源。您应该修复我猜测是要解析为DataFrames的错误的JSON。这是清理为有效JSON时示例中令人讨厌的部分的样子:
'{"boards":[{"postCount":"75","topicCount":"5","name":"Hardware","totalCount":80},{"postCount":"20","topicCount":"11","name":"Marketplace","totalCount":31},{"postCount":"26","topicCount":"1","name":"Atari 5200","totalCount":27},{"postCount":"9","topicCount":0,"name":"Atari 8","totalCount":9}'
然后您可以使用json.loads
将这些字符串转换为有效的Python对象:
from_json = [{"username": "last",
"board_data": {'boards': [{'postCount': '75',
'topicCount': '5',
'name': 'Hardware',
'totalCount': 80},
{'postCount': '20',
'topicCount': '11',
'name': 'Marketplace',
'totalCount': 31},
{'postCount': '26',
'topicCount': '1',
'name': 'Atari 5200',
'totalCount': 27},
{'postCount': '9', 'topicCount': 0, 'name': 'Atari 8', 'totalCount': 9}]}},
{"username": "truk",
"board_data": {'boards': [{'postCount': '351',
'topicCount': '11',
'name': 'Atari 2600',
'totalCount': 362},
{'postCount': '333',
'topicCount': '22',
'name': 'Hardware',
'totalCount': 355},
{'postCount': '194',
'topicCount': '8',
'name': 'Marketplace',
'totalCount': 202}]}}]
如上所述分析数据后,您可以完全避免在Pandas中进行如下字符串操作:
dfs = []
for i in range(2):
_df = pd.DataFrame.from_records(from_json[i]['board_data']['boards'])
user_df = _df.assign(username=from_json[i]['username'])
user_df.drop(columns=['postCount', 'topicCount'], inplace=True)
dfs.append(user_df)
single_df = pd.concat(dfs, axis=0).sort_values('username').reset_index(drop=True)
您应该最终得到此DataFrame,之后您可以根据自己的喜好轻松清理列名和列顺序:
print(single_df)
name totalCount username
0 Hardware 80 last
1 Marketplace 31 last
2 Atari 5200 27 last
3 Atari 8 9 last
4 Atari 2600 362 truk
5 Hardware 355 truk
6 Marketplace 202 truk