Question

I have a dataframe with 2 columns (time and pressure).

timestep value
    0    393
    1    389
    2    402
    3    408
    4    413
    5    463
    6    471
    7    488
    8    422
    9    404
    10   370

I first need to find the frequency of each pressure value and rank them df['freq_rank'] which works fine, but when I am trying to mask the dataframe by comparing the column against count value & find interval difference, I am getting NaN results..

import numpy as np
import pandas as pd
from matplotlib.pylab import *
import re
import pylab
from pylab import *
import datetime
from scipy import stats
import matplotlib.pyplot

df = pd.read_csv('copy.csv')
dataset = np.loadtxt(df, delimiter=";")
df.columns = ["Timestamp", "Pressure"]

## Timestep as int
df = pd.DataFrame({'timestep':np.arange(3284), 'value': df.Pressure})

## Rank of the frequency of each value in the df
vcs = {v: i for i, v in enumerate(df.value.value_counts().index)}
df['freq_rank'] = df.value.apply(vcs.get)
print(df.freq_rank)


>>Output:
>>0    131
>>1    235
>>2     99
>>3     99
>>4    101
>>5    101
>>6    131
>>7     79
>>8     79



## Find most frequent value
count = df['value'].value_counts().sort_values(ascending=[False]).nlargest(10).index.values[0] 

## Mask the DF by comparing the column against count value & find interval diff.
x = df.loc[df['value'] == count, 'timestep'].diff()
print(x)

>>Output:
>>50        1.0
>>112      62.0
>>215     103.0
>>265      50.0
>>276      11.0
>>277       1.0
>>278       1.0
>>318      40.0
>>366      48.0
>>367       1.0
>>368       1.0
>>372       4.0

df['freq'] = df.value.apply(x.get)
print(df.freq)

>>Output:
>>0    NaN
>>1    NaN
>>2    NaN
>>3    NaN
>>4    NaN
>>5    NaN
>>6    NaN
>>7    NaN
>>8    NaN

I don't understand why print(x) returns the right output and print(df['freq']) returns NaN.

Answer 1

I think your problem is with the last statement df['freq'] = df.value.apply(x.get)

If you just want to copy the x to the new column df['freq'] you can just:

df['freq'] = x

Then print(df.freq) will give you the same results as your print(x) statement.

Update: Your problem is with the indicies. df only has index values from 0-10 where as your x has 50, 112, 215... When assigning to df, only values that has an existing index is added.

df.value.apply returns NaN

1 个答案: