我想提取字典的键,但问题是该键用单引号引起来。
from pyspark import SparkContext, SparkConf
import collections,shutil,os
conf = SparkConf().setMaster("local").setAppName("Word_count")
sc=SparkContext(conf=conf)
rdd=sc.textFile("/home/karan/dummy files/patient.csv")
rdd2=sc.textFile("/home/karan/dummy files/doctors.csv")
def nameOfDoc():
names={}
with open("/home/karan/dummy files/doctors.csv") as l:
for x in l:
nameExt=x.split('\t')
names[int(nameExt[0])]=nameExt[1]
return names
docName=sc.broadcast(nameOfDoc())
docId=rdd.map(lambda x:x.split(",")).\
map(lambda x:(x[3],1)).\
reduceByKey(lambda x,y:x+y).\
map(lambda x:(x[1],x[0])).\
sortByKey(ascending=False).\
map(lambda x:(x[1],x[0]))
rs=docId.collect()
if os.path.exists("/home/karan/output2"):
shutil.rmtree("/home/karan/output2")
for x in rs:
print(docName.value[x[0]],end=" -> ")
print(x[1])
sc.parallelize(rs).saveAsTextFile("output2")
我的代码给我这个错误
文件“ /home/karan/hospitalsDemo.py”,第28行,在 print(docName.value [x [0]],end =“->”)KeyError:'2'
答案 0 :(得分:1)
for x in rs:
print(docName.value[x[0]],end=" -> ")
print(x[1])
我认为Green Cloak Guy将字符串转换为整数是正确的。 由于您将x用于x [0]和x [1],并且还避免了键可能无法转换为整数的情况,因此我认为您应该“
for x in rs:
try:
xkey = int(x)
except:
xkey = x
print(docName.value[xkey[0]],end=" -> ")
print(xkey[1])