映射/绘制1维阵列/系列中的值的距离

时间:2016-05-09 17:23:00

标签: python pandas matplotlib scipy scikit-learn

我有一个像这样的熊猫系列:

(['StartGame', 'TutorialEnded',  'FBConnect',
  'StartGame', 'Sale', 'FBConnect', 'InviteSent',
  'StartGame', 'Finish_1', 'Sale', 'Bought',
  'Finish_22',  'FBConnect', 'Finish_2',
  'TutorialEnded', 'Finish_18', ...])

我想绘制包含字符串Finish的值与值sale的外观之间的距离,以查看两者之间是否存在任何相关性,以及检查两者之间的相关性。与sale相关的其他词语的出现。换句话说,我可以使用系列中任何值的外观来预测附近sale的出现吗?即使绘制一条散点线,我为每个值分配不同的颜色,这样我就能感觉到它会有所帮助,但我不知道该怎么做。

1 个答案:

答案 0 :(得分:1)

设置

df = pd.DataFrame(['StartGame', 'TutorialEnded',  'FBConnect',
  'StartGame', 'Sale', 'FBConnect', 'InviteSent',
  'StartGame', 'Finish_1', 'Sale', 'Bought',
  'Finish_22',  'FBConnect', 'Finish_2',
  'TutorialEnded', 'Finish_18'], columns=['Value'])
df.index.name = 'position'
df.reset_index(inplace=True)

助手功能

def isFinish(x):
    """Returns True if Value matches 'Finish', False otherwise."""
    return bool(re.match(r'.*Finish.*', x.ix['Value']))

def isSale(x):
    """Returns True if Value matches 'Sale', False otherwise."""
    return bool(re.match(r'.*Sale.*', x.ix['Value']))

df['Finish'] = df.apply(isFinish, axis=1)
df['Sale'] = df.apply(isSale, axis=1)
df['FinishCount'] = df.Finish.cumsum()

def cumargmax(x):
    """get latest position of a Finish row."""
    if x.ix['FinishCount'] == 0:
        return np.nan
    else:
        return df.FinishCount.loc[:x.ix['position']].argmax()

df['Distance'] = df.position - df.apply(cumargmax, axis=1)

示范

print df

    position          Value Finish   Sale  FinishCount  Distance
0          0      StartGame  False  False            0       NaN
1          1  TutorialEnded  False  False            0       NaN
2          2      FBConnect  False  False            0       NaN
3          3      StartGame  False  False            0       NaN
4          4           Sale  False   True            0       NaN
5          5      FBConnect  False  False            0       NaN
6          6     InviteSent  False  False            0       NaN
7          7      StartGame  False  False            0       NaN
8          8       Finish_1   True  False            1       0.0
9          9           Sale  False   True            1       1.0
10        10         Bought  False  False            1       2.0
11        11      Finish_22   True  False            2       0.0
12        12      FBConnect  False  False            2       1.0
13        13       Finish_2   True  False            3       0.0
14        14  TutorialEnded  False  False            3       1.0
15        15      Finish_18   True  False            4       0.0

或者在有销售时的子集

print df[df.Sale]

   position Value Finish  Sale  FinishCount  Distance
4         4  Sale  False  True            0       NaN
9         9  Sale  False  True            1       1.0