将异常值和空白列表转置到新表中

时间:2019-01-24 03:05:10

标签: python python-3.x pandas

我希望编写一个程序,该程序将通过多列数据运行,并根据发现的异常值和空白的值创建一个新的数据框。目前,我有以下代码将值替换为“离群值”和“无数据”,但我正在努力将其转换为新的数据框。

请求的外观: enter image description here

import pandas as pd 
from pandas import ExcelWriter

# Remove Initial Data Quality
outl = ['.',0,' ']

# Pull in Data
path = r"C:\Users\robert.carmody\desktop\Python\PyTest\PyTGPS.xlsx"
sheet = 'Raw Data'
df = pd.read_excel(path,sheet_name=sheet)
data = pd.read_excel(path,sheet_name=sheet)

j = 0
while j < len(df.keys()):           #run through total number of columns
    list(df.iloc[:,j])              #create a list of all values within the 'j' column
    if type(list(df.iloc[:,j])[0]) == float:
        q1 = df.iloc[:,j].quantile(q=.25)
        med = df.iloc[:,j].quantile(q=.50)
        q3 = df.iloc[:,j].quantile(q=.75)
        iqr = q3 - q1
        ub = q3 + 1.5*iqr
        lb = q1 - 1.5*iqr
        mylist = []                     #outlier list is defined
        for i in df.iloc[:,j]:          #identify outliers and add to the list
            if i > ub or i < lb:
                mylist.append(float(i))
            else:
                i
        if mylist == []:
            mylist = ['Outlier']
        else:
            mylist
    else:
        mylist = ['Outlier']
    data.iloc[:,j].replace(mylist,'Outlier',inplace=True)
    j = j + 1

data = data.fillna('No Data')

#Excel
path2 = r"C:\Users\robert.carmody\desktop\Python\PyTest\PyTGPS.xlsx"
writer = ExcelWriter(path2)
df.to_excel(writer,'Raw Data')
data.to_excel(writer,'Adjusted Data')
writer.save()

1 个答案:

答案 0 :(得分:0)

假设您的数据看起来像这样,为简单起见,上限为2,下限为0,

df = pd.DataFrame({'group':'A B C D E F'.split(' '), 'Q1':[1,1,5,2,2,2], 'Q2':[1,5,5,2,2,2],'Q3':[2,2,None,2,2,2]})
df.set_index('group', inplace=True)

即:

    Q1  Q2  Q3
group           
A   1   1   2.0
B   1   5   2.0
C   5   5   NaN
D   2   2   2.0
E   2   2   2.0
F   2   2   2.0

那么以下内容可能会给出您想要的:

newData = []
for quest in df.columns:           #run through the columns
    q1 = df[quest].quantile(q=.25)
    med = df[quest].quantile(q=.50)
    q3 = df[quest].quantile(q=.75)
    iqr = q3 - q1
    #ub = q3 + 1.5*iqr
    ub = 2   #my 
    #lb = q1 - 1.5*iqr
    lb = 0   #my

    for group in df.index:          
        i = df.loc[group, quest]

        if i > ub or i < lb:         #identify outliers and add to the list
            newData += [[group, quest, 'Outlier', i]]
        elif (i>0 or i<=0)==False:
            newData += [[group, quest, 'None', None]]

创建一个二维列表,可以轻松地在数据框中进行转换 通过

pd.DataFrame(newData)