我正在从 HDFS 读取文件,但不断收到此错误:
TypeError: 'int' object is not subscriptable
.csv 文件:
CLAIM_NUM,BEN_ST,AGE,MEDICAL_ONLY_IND,TTL_MED_LOSS,TTL_IND_LOSS,TTL_MED_EXP,TTL_IND_EXP,BP_CD,NI_CD,legalrep,depression,cardiac,diabetes,hypertension,obesity,smoker,subabuse,arthritis,asthma,CPT_codes,D,P,NDC_codes
123456789,IL,99,1,2201.26,0,97.16,0,31,4,1,0,0,0,0,0,0,0,0,0,NA,8409~71941,NA,NA
987654321,AL,98,1,568.12,0,20.82,0,42,52,1,0,0,0,0,0,0,0,0,0,NA,7242~8472~E9273,NA,NA
我的代码:
with hdfs.open("/user/ras.csv") as f:
reader = f.read()
for i, row in enumerate(reader, start=1):
root = ET.Element('cbcalc')
icdNode = ET.SubElement(root, "icdcodes")
for code in row['D'].split('~'):
ET.SubElement(icdNode, "code").text = code
ET.SubElement(root, "clientid").text = row['CLAIM_NUM']
ET.SubElement(root, "state").text = row['BEN_ST']
ET.SubElement(root, "country").text = "US"
ET.SubElement(root, "age").text = row['AGE']
ET.SubElement(root, "jobclass").text = "1"
ET.SubElement(root, "fulloutput").text ="Y"
cfNode = ET.SubElement(root, "cfactors")
for k in ['legalrep', 'depression', 'diabetes',
'hypertension', 'obesity', 'smoker', 'subabuse']:
ET.SubElement(cfNode, k.lower()).text = str(row[k])
psNode = ET.SubElement(root, "prosummary")
psicdNode = ET.SubElement(psNode, "icd")
for code in row['P'].split('~'):
ET.SubElement(psNode, "code").text = code
psndcNode = ET.SubElement(psNode, "ndc")
for code in row['NDC_codes'].split('~'):
ET.SubElement(psNode, "code").text = code
cptNode = ET.SubElement(psNode, "cpt")
for code in row['CPT_codes'].split('~'):
ET.SubElement(cptNode, "code").text = code
ET.SubElement(psNode, "hcpcs")
doc = ET.tostring(root, method='xml', encoding="UTF-8")
response = requests.post(target_url, data=doc, headers=login_details)
response_data = json.loads(response.text)
if type(response_data)==dict and 'error' in response_data.keys():
error_results.append(response_data)
else:
api_results.append(response_data)
我需要更改什么才能循环遍历 csv 文件并将数据转换为 xml 格式以进行 API 调用?
我已经在 python 中测试了这段代码,它似乎可以工作,但是一旦我将文件放入 HDFS,它就开始崩溃。
答案 0 :(得分:0)
问题是(可能;我没有安装这个库)f.read()
正在返回一个 bytes 对象。如果您对其进行迭代(例如使用 enumerate
),您将检查 int
(文件的每个字符一个,取决于上下文),而不是任何类型的结构化“行”对象。< /p>
在开始要编写的循环之前,需要进行额外的处理。
这样的事情可能做你想做的事:
import pydoop.hdfs as hdfs
from io import TextIOWrapper
from csv import DictReader
with hdfs.open("/user/ras.csv") as h,
TextIOWrapper(h, *unknown_settings) as w,
DictReader(w, *defaults_are_probably_ok) as dict_reader:
for row in dict_reader:
...