Question

对于当前的研究项目，我计划在基于Python / Pandas的预定义时间范围内读取JSON对象"Main_Text"。但是，代码会在第TypeError: string indices must be integers行产生错误line = row["Main_Text"]。

我已经浏览了解决同一问题的网页，但尚未找到任何解决方案。有什么有用的调整可以使这项工作吗？

JSON文件具有以下结构：

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

相应的代码部分如下所示：

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

print(filtered_dates)


# Processing
for row in filtered_dates:
    line = row["Text Main"]
    # Remove the leading spaces and newline character
    line = line.strip()

Answer 1

如果要求收集“文本主要”列的所有内容，则我们可以这样做：

line = list(filtered_dates['Text Main'])

然后我们还可以应用试条：

line = [val.strip() for val in line]

熊猫：TypeError：字符串索引必须为整数

1 个答案: