未知的角色问题

时间:2015-06-19 11:47:29

标签: python pandas

我已经在python中读取了一个数据框,其中包含一个包含欧元符号“price_€”的列名.Python将该列视为price_ 。它不允许我使用€或

File "<ipython-input-53-d7f8249147e7>", line 1
df[price_€] = df[0].str.replace(r'[€,]', '').astype('float')
         ^
SyntaxError: invalid syntax

有任何想法如何从列名中删除它,以便我可以开始引用它吗?

3 个答案:

答案 0 :(得分:2)

您不能在变量名中使用欧元符号:

Identifiers (also referred to as names) are described by the following lexical definitions:

identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"

您需要使用字符串:

df["price_€"] ...
对于我来说,pandas对欧元符号实际上没有问题:

import pandas as pd

df = pd.DataFrame([[1, 2]], columns=["£", "€"])

print(df["€"])
print(df["£"])
0    2
Name: €, dtype: int64
0    1
Name: £, dtype: int64

该文件是cp1252编码的,因此您需要指定编码:

mport pandas as pd
iimport codecs
df = pd.read_csv("PPR-2015.csv",header=0,encoding="cp1252")

print(df.columns)
Index([u'Date of Sale (dd/mm/yyyy)', u'Address', u'Postal Code', u'County', 
u'Price (€)', u'Not Full Market Price', u'VAT Exclusive', u'Description of Property', u'Property Size Description'], dtype='object')

print(df[u'Price (€)'])
0     €138,000.00
1     €270,000.00
2      €67,000.00
3     €900,000.00
4     €176,000.00
5     €155,000.00
6     €100,000.00
7     €120,000.00
8     €470,000.00
9     €140,000.00
10    €592,000.00
11     €85,000.00
12    €422,500.00
13    €225,000.00
14     €55,000.00
...
17433    €262,000.00
17434    €155,000.00
17435    €750,000.00
17436     €96,291.69
17437    €112,000.00
17438    €350,000.00
17439    €190,000.00
17440     €25,000.00
17441    €100,000.00
17442     €75,000.00
17443     €46,000.00
17444    €175,000.00
17445     €48,500.00
17446    €150,000.00
17447    €400,000.00
Name: Price (€), Length: 17448, dtype: object

然后改为浮动:

df[u'Price (€)'] = df[u'Price (€)'].str.replace(ur'[€,]'), '').astype('float')

print(df['Price (€)'.decode("utf-8")])

输出:

0     138000
1     270000
2      67000
3     900000
4     176000
5     155000
6     100000
7     120000
8     470000
9     140000
10    592000
11     85000
12    422500
13    225000
14     55000
...
17433    262000.00
17434    155000.00
17435    750000.00
17436     96291.69
17437    112000.00
17438    350000.00
17439    190000.00
17440     25000.00
17441    100000.00
17442     75000.00
17443     46000.00
17444    175000.00
17445     48500.00
17446    150000.00
17447    400000.00
Name: Price (€), Length: 17448, dtype: float64

答案 1 :(得分:1)

您在字符串

上使用如下的lambda过滤器
import string
s = "some\x00string. with\x15 funny characters"
filter(lambda x: x in string.printable, s)

输出

'somestring. with funny characters'

答案 2 :(得分:0)

您应该使用rename重命名列名:

In [189]:
df = pd.DataFrame(columns = ['price_€'])
df

Out[189]:
Empty DataFrame
Columns: [price_€]
Index: []

In [191]:
df.rename(columns = {'price_€':'price'},inplace=True)
df

Out[191]:
Empty DataFrame
Columns: [price]
Index: []

同样df[price_€]是选择列的无效方式,您需要传递一个字符串,以便df['price_€']是正确的格式。

还有:

df[0].str.replace(r'[€,]', '').astype('float')

目前还不清楚你在这里尝试了什么,df[0]将再次引发一个KeyError因为索引你需要传递一个字符串的列。

我也不明白你为什么把这个专栏投射到浮子上,你没有解释这一部分。