对于当前的研究项目,我计划在基于Python / Pandas的预定义时间范围内读取JSON对象"Main_Text"
。但是,代码会在第TypeError: string indices must be integers
行产生错误line = row["Main_Text"]
。
我已经浏览了解决同一问题的网页,但尚未找到任何解决方案。有什么有用的调整可以使这项工作吗?
JSON文件具有以下结构:
[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]
相应的代码部分如下所示:
import string
import json
import csv
import pandas as pd
import datetime
import numpy as np
# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])
# Create an empty dictionary
d = dict()
# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"
after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]
print(filtered_dates)
# Processing
for row in filtered_dates:
line = row["Text Main"]
# Remove the leading spaces and newline character
line = line.strip()
答案 0 :(得分:1)
如果要求收集“文本主要”列的所有内容,则我们可以这样做:
line = list(filtered_dates['Text Main'])
然后我们还可以应用试条:
line = [val.strip() for val in line]