我想找到来自文件的数据概率。我已经测试了一种方法,首先找到共同出现,然后找到总计数。数据集可以如下给出:
director Actor genere
Scorseses DeNiro crime
Coppola DeNiro crime
Hitchcock Stewart Thriller
Hitchcock Grant Thriller
Koster Grant Comedy
Koster Stewart Comedy
代码是:
def find_prob(c,d):
columns=defaultdict(list)
with open('function.csv') as f:
reader=csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
print(columns)
ab=[0,1]
for k in columns:
for i in ab:
if(columns[k][i]==d):
words_count=collections.Counter(columns[k])
a=words_count[d]
rdr = csv.reader(open(r"function.txt"))
c1 = collections.Counter((x, y) for a,b,_ in rdr for x, y in product(a.split(","),b.split(",")))
xy=c1[c,d]
print(xy)
print(a)
prob= (xy/a)
print(prob)
我想要一种更通用的方法来找到价值的共同出现?