我已经映射了RDD格式的数据
crimesMapped = crimesOnly.map(lambda line: (line.split(",")[0],line.split(",")[1], line.split(",")[2], line.split(",")[3], line.split(",")[4], line.split(",")[5], line.split(",")[6], line.split(",")[7], line.split(",")[8], line.split(",")[9], line.split(",")[10], line.split(",")[11], line.split(",")[12], line.split(",")[13], line.split(",")[14],line.split(",")[17], line.split(",")[18], line.split(",")[21]))
crimesMapped.take(1)
输出:
[('11034701',
'JA366925',
'01/01/2001 11:00:00 AM',
'016XX E 86TH PL',
'1153',
'DECEPTIVE PRACTICE',
'FINANCIAL IDENTITY THEFT OVER $ 300',
'RESIDENCE',
'false',
'false',
'0412',
'004',
'8',
'45',
'11',
'2001',
'08/05/2017 03:50:08 PM',
'')
]
我想要的数据在这里:
s = crimesMapped.take(1)
print(s)
print("-------------------------------------------------------------------------------------------------------------------------------")
print(s[0][11])
输出:
[('11034701', 'JA366925', '01/01/2001 11:00:00 AM', '016XX E 86TH PL', '1153', 'DECEPTIVE PRACTICE', 'FINANCIAL IDENTITY THEFT OVER $ 300', 'RESIDENCE', 'false', 'false', '0412', '004', '8', '45', '11', '2001', '08/05/2017 03:50:08 PM', '')]
-------------------------------------------------------------------------------------------------------------------------------
004
我想过滤数据以只给我每个数组第11列中的数据,我该怎么做?
crimesMapped.filter(lambda x: x[][11]) --??