以列的对数为准

时间:2017-11-10 18:45:14

标签: python pandas numpy dataframe

我对编程很新(在python中),我想创建一个新的变量,它是列的对数(来自导入的excel文件)。我从这个网站尝试了不同的解决方案,但我一直收到错误。我的最新错误是AttributeError: 'str' object has no attribute 'log'. 我已经删除了所有不是"数字'的值,但我仍然不知道如何将值从字符串转换为整数(如果是这种情况,因为' int (邻里)'不工作)。

这是我现在的代码:

import pandas as pd
import numpy as np

df=pd.read_excel("kwb-2016_del_col_del_row.xls")
df = df[df.m_woz != "."] # drop rows with values "."
neighborhood=df[df.recs=="Neighborhood"]
neighborhood=neighborhood["m_woz"]
print(neighborhood)

np.log(neighborhood)

这是我得到的错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-66-46698de51811> in <module>()
     12 print(neighborhood)
     13 
---> 14 np.log(neighborhood)


AttributeError: 'str' object has no attribute 'log'

有人能帮帮我吗?

1 个答案:

答案 0 :(得分:0)

也许您没有删除您认为自己的数据? 尝试打印数据类型以查看它们是什么 在DataFrame中,您的列可能会填充对象而不是数字。

print(df.dtypes)

另外,您可能需要查看这两页

Select row from a DataFrame based on the type of the object(i.e. str)

Pandas: convert dtype 'object' to int

以下是我构建并以交互方式运行的示例,它正确获取对数(不要输入&gt;&gt;&gt;):

>>> raw_data = {'m_woz': ['abc', 'def', 1.23, 45.6, '.xyz'], 
    'recs': ['Neighborhood', 'Neighborhood', 
    'unknown', 'Neighborhood', 'whatever']}
>>> df = pd.DataFrame(raw_data, columns = ['m_woz', 'recs'])
>>> print(df.dtypes)
m_woz    object
recs     object
dtype: object

请注意,类型是对象,而不是floatintstr

继续,这是dfneighborhood的样子:

>>> df
  m_woz          recs
0    42  Neighborhood
1   def  Neighborhood
2  1.23       unknown
3  45.6  Neighborhood
4  .xyz      whatever

>>> neighborhood=df[df.recs=="Neighborhood"]
>>> neighborhood

  m_woz          recs
0    42  Neighborhood
1   def  Neighborhood
3  45.6  Neighborhood

以下是技巧...... 此行选择neighborhoodintfloat的所有行(如果您复制/粘贴此内容,请小心修复缩进

>>> df_num_strings = neighborhood[neighborhood['m_woz'].
        apply(lambda x: type(x) in (int, float))]

>>> df_num_strings
  m_woz          recs
0    42  Neighborhood
3  45.6  Neighborhood

几乎就是......将数字转换为字符串

的浮点数
>>> df_float = df_num_strings['m_woz'].astype(str).astype(float)
>>> df_float
0    42.0
3    45.6

最后,计算对数:

>>> np.log(df_float)
0    3.737670
3    3.819908
Name: m_woz, dtype: float64