pandas ValueError:无法将字符串转换为float:'p-'

时间:2018-01-07 02:49:18

标签: python pandas geography

所以,我有一个数据帧,其中有超过10 ^ 6行,我只是简单地将lat(度数min)转换为lat(仅度数)。然而,我的框架中有一些线条,它们有一个字符串“p-”,它在早期杀死了我的循环。我尝试了一些事情(下面)。

代码:

 body{
    background: #ecf0f1;
    font-family: 'Open Sans', sans-serif;
}
header{
    background-color: #2A2A36;

}
.bold{
    font-weight: bold;
}
.loginInput input{
    margin: 0 auto;
    width: 150px
}

.dropdown-menu{
    width: 200px;
    height: 220px;
    background:#1F2021;
    opacity: 0.9;
}
.navbar-nav .nav-link{
    color: #ecf0f1;
    text-align: right;
}
.navbar-nav .nav-link:hover{
    background:#d35400; 
}
form{
    margin: 0 auto;
}
.searchForm form{
    height: 0px;
    display: block;
}

*{
    padding: 0;
    margin: 0;
}
.navbar-nav .nav-item{
    border-right: 1px solid #FFF;
}
.navbar-nav{
    background:#2A2A36;
}

代码返回此错误:

import pandas as pd
import numpy as np
import glob
import matplotlib.pyplot as plt

path = r'/home/engr/Documents/SchoolHR/Data/SFSU-Boat/SBE45m/2015/'

allfiles_list = glob.glob(path + "/15*.hex")
allfiles_list = sorted(allfiles_list)
col = ["temp", "conduct", "salinity", "lat", "lon", "hms", "dmy"]
big_frame = pd.DataFrame()

for name in allfiles_list:
    df = pd.read_csv(name, skiprows=12, encoding="latin1", names=col, na_values=0, na_filter=False, engine="c")
    big_frame = big_frame.append(df)

# TODO surgery on columns to convert to float for use on big_frame

# regex \D to remove any non-digit characters -- hms & dmy
big_frame["hms"].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
big_frame["dmy"].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
big_frame["temp"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["conduct"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["salinity"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["lat"].replace(regex=True,inplace=True,to_replace='[lonat=]',value='')
big_frame["lon"].replace(regex=True,inplace=True,to_replace='[lonat=]',value='')

for index, row in big_frame.iterrows():
if row.lat[-1] == 'N':
    D = float(row.lat[1:3])
    M = float(row.lat[4:10])
    DD = D + float(M/60)
    row.lat = DD
if row.lon[-1] == 'W':
    D1 = float(row.lon[1:4])
    M1 = float(row.lon[5:12])
    DD1 = D1 + float(M1/60)
    row.lon = -DD1

我尝试通过执行此操作并在数据框上运行循环来修改代码:

ValueError: could not convert string to float: 'p-'

但我只是收到了这个:

big_frame['lon'] = big_frame.lon.str.replace('p-?' , '')
big_frame['lat'] = big_frame.lat.str.replace('p-?' , '')
big_frame["lat"].replace(regex=True,inplace=True,to_replace='[)]',value='')
big_frame["lon"].replace(regex=True,inplace=True,to_replace='[)]',value='')

以下示例数据集:

IndexError: string index out of range

1 个答案:

答案 0 :(得分:0)

您可以使用以下内容删除有问题的行:

big_frame =big_frame[big_frame['col_name'].apply(lambda x: x.isdigit())]

然后行动不应该失败。