我想在评级栏中计算评分的条件概率(' A' B'' C')。
company model rating type
0 ford mustang A coupe
1 chevy camaro B coupe
2 ford fiesta C sedan
3 ford focus A sedan
4 ford taurus B sedan
5 toyota camry B sedan
输出:
Prob(rating=A) = 0.333333
Prob(rating=B) = 0.500000
Prob(rating=C) = 0.166667
Prob(type=coupe|rating=A) = 0.500000
Prob(type=sedan|rating=A) = 0.500000
Prob(type=coupe|rating=B) = 0.333333
Prob(type=sedan|rating=B) = 0.666667
Prob(type=coupe|rating=C) = 0.000000
Prob(type=sedan|rating=C) = 1.000000
任何帮助,谢谢.. !!
答案 0 :(得分:8)
您可以使用.groupby()
和内置.div()
:
rating_probs = df.groupby('rating').size().div(len(df))
rating
A 0.333333
B 0.500000
C 0.166667
和条件probs:
df.groupby(['type', 'rating']).size().div(len(df)).div(rating_probs, axis=0, level='rating')
coupe A 0.500000
B 0.333333
sedan A 0.500000
B 0.666667
C 1.000000
答案 1 :(得分:3)
您需要添加reindex
,以便为缺少的对添加Car
值:
0
另一个解决方案,谢谢Zero:
mux = pd.MultiIndex.from_product([df['rating'].unique(), df['type'].unique()])
s = (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
s = s.reindex(mux, fill_value=0)
print (s)
A coupe 0.500000
sedan 0.500000
B coupe 0.333333
sedan 0.666667
C coupe 0.000000
sedan 1.000000
Name: model, dtype: float64
答案 2 :(得分:2)
您可以使用groupby
:
In [2]: df = pd.DataFrame({'company': ['ford', 'chevy', 'ford', 'ford', 'ford', 'toyota'],
'model': ['mustang', 'camaro', 'fiesta', 'focus', 'taurus', 'camry'],
'rating': ['A', 'B', 'C', 'A', 'B', 'B'],
'type': ['coupe', 'coupe', 'sedan', 'sedan', 'sedan', 'sedan']})
In [3]: df.groupby('rating').count()['model'] / len(df)
Out[3]:
rating
A 0.333333
B 0.500000
C 0.166667
Name: model, dtype: float64
In [4]: (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
Out[4]:
rating type
A coupe 0.500000
sedan 0.500000
B coupe 0.333333
sedan 0.666667
C sedan 1.000000
Name: model, dtype: float64
答案 3 :(得分:0)
首先,转换为熊猫数据框。这样,您就可以利用熊猫的groupby方法。
Dim wa As Microsoft.Office.Interop.Word.Application
Dim wd As Microsoft.Office.Interop.Word.Document
Dim wp As Microsoft.Office.Interop.Word.Paragraph
'Dim section As Microsoft.Office.Interop.Word.Section
Dim wp1 As Microsoft.Office.Interop.Word.Paragraph
wa = CreateObject("word.application")
wa.Visible = False
wd = wa.Documents.Add
wp1 = wd.Content.Paragraphs.Add
wp1.Range.Font.Bold = True
wp1.Range.Text = DateTimePicker1.Text
wp = wd.Content.Paragraphs.Add
wp.Range.Text = TextBox1.Text + vbNewLine + TextBox2.Text
wp.Range.Font.Name = "Times New Roman"
' wp.Alignment.wdAlignParagraphDistribute()
wd.SaveAs("g:\sample.docx")
wa.Quit()
然后,根据事件(即评分)进行分组。
collection = {"company": ["ford", "chevy", "ford", "ford", "ford", "toyota"],
"model": ["mustang", "camaro", "fiesta", "focus", "taurus", "camry"],
"rating": ["A", "B", "C", "A", "B", "B"],
"type": ["coupe", "coupe", "sedan", "sedan", "sedan", "sedan"]}
df = pd.DataFrame(collection)
答案 4 :(得分:0)
pd.crosstab(df.type, df.rating, margins=True, normalize="index")
rating A B C
type
coupe 0.500000 0.5 0.000000
sedan 0.250000 0.5 0.250000
All 0.333333 0.5 0.166667
这里的 All 行给出了 A、B 和 C 的概率,现在是条件概率。
pd.crosstab(df.type, df.rating, margins=True, normalize="columns")
rating A B C All
type
coupe 0.5 0.333333 0.0 0.333333
sedan 0.5 0.666667 1.0 0.666667
此处您的条件概率在表中,例如,给定类型的条件概率是轿跑车,它在轿跑车行和 A 列中的 A 评级为 0.5。 概率(type=coupe|rating=A) = 0.5