Question

我有一个数据库，其中有一个列，我想从中获取一些信息。我需要创建一个新数据库（我称其为“ df_topic”），在该数据库中，我从“ df”数据库的“ board_data”列中收集了“ topics”和“ total”。

我尝试了一些代码，但是遇到了一个我不知道如何解决的错误。

这是数据库的示例：

services.AddDataProtection()
    .SetApplicationName("my-mvc-app") //required so that all container apps have the same name
    .PersistKeysToAwsS3(
        new AmazonS3Client(RegionEndpoint.USEast1), //I had to specify the endpoint or else I got an exception
        new S3XmlRepositoryConfig("my-mvc-app-data-protection-keys") //the name of the bucket you create in S3
        {
            KeyPrefix = "DataProtectionKeys/", //optional KeyPrefix (i.e. subfolder in S3)
        });

这是预期的结果：

df = [{"username": "last",
    "board_data": "{\"boards\":[{\"postCount\":\"75\",\"topicCount\":\"5\",\"name\":\"Hardware\",\"url\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount\":\"26\",\"topicCount\":\"1\",\"name\":\"Atari 5200\",\"url\",\"totalCount\":27},{\"postCount\":\"9\",\"topicCount\":0,\"name\":\"Atari 8\",\"url\"\"totalCount\":9}"
    },
    {"username": "truk",
     "board_data": "{\"boards\":[{\"postCount\":\"351\",\"topicCount\":\"11\",\"name\":\"Atari 2600\",\"url\",\"totalCount\":362},{\"postCount\":\"333\",\"topicCount\":\"22\",\"name\":\"Hardware\",\"url\",\"totalCount\":355},{\"postCount\":\"194\",\"topicCount\":\"8\",\"name\":\"Marketplace\",\"url\",\"totalCount\":202}"
    }]
df = pd.DataFrame(df)
df

这是我正在使用的代码，但是存在TypeError：

   username   topic      total
0   last     Hardware     80
1   last     Marketplace  31
2   last     Atari 5200   27
3   last     Atari 8      9
4   truk     Atari 2600   362
5   truk     Hardware     355
6   truk     Marketplace  202

这是我得到的错误：

TypeError：字符串索引必须为整数

Answer 1

您的错误来自尝试像对待dict一样操作字符串对象。如果您使用的是Pandas的.loc或.iloc语法来索引/切片[docs]，这将更加清楚。

我建议备份并解决问题的根源。您应该修复我猜测是要解析为DataFrames的错误的JSON。这是清理为有效JSON时示例中令人讨厌的部分的样子：

'{"boards":[{"postCount":"75","topicCount":"5","name":"Hardware","totalCount":80},{"postCount":"20","topicCount":"11","name":"Marketplace","totalCount":31},{"postCount":"26","topicCount":"1","name":"Atari 5200","totalCount":27},{"postCount":"9","topicCount":0,"name":"Atari 8","totalCount":9}'

然后您可以使用json.loads将这些字符串转换为有效的Python对象：

from_json = [{"username": "last",
    "board_data": {'boards': [{'postCount': '75',
   'topicCount': '5',
   'name': 'Hardware',
   'totalCount': 80},
  {'postCount': '20',
   'topicCount': '11',
   'name': 'Marketplace',
   'totalCount': 31},
  {'postCount': '26',
   'topicCount': '1',
   'name': 'Atari 5200',
   'totalCount': 27},
  {'postCount': '9', 'topicCount': 0, 'name': 'Atari 8', 'totalCount': 9}]}},
{"username": "truk",
     "board_data": {'boards': [{'postCount': '351',
   'topicCount': '11',
   'name': 'Atari 2600',
   'totalCount': 362},
  {'postCount': '333',
   'topicCount': '22',
   'name': 'Hardware',
   'totalCount': 355},
  {'postCount': '194',
   'topicCount': '8',
   'name': 'Marketplace',
   'totalCount': 202}]}}]

如上所述分析数据后，您可以完全避免在Pandas中进行如下字符串操作：

dfs = []
for i in range(2):
    _df = pd.DataFrame.from_records(from_json[i]['board_data']['boards'])
    user_df = _df.assign(username=from_json[i]['username'])
    user_df.drop(columns=['postCount', 'topicCount'], inplace=True)
    dfs.append(user_df)

single_df = pd.concat(dfs, axis=0).sort_values('username').reset_index(drop=True)

您应该最终得到此DataFrame，之后您可以根据自己的喜好轻松清理列名和列顺序：

print(single_df)

          name  totalCount username
0     Hardware          80     last
1  Marketplace          31     last
2   Atari 5200          27     last
3      Atari 8           9     last
4   Atari 2600         362     truk
5     Hardware         355     truk
6  Marketplace         202     truk

如何在Python中修复'TypeError：字符串索引必须是整数'错误

1 个答案: