我已经在python中读取了一个数据框,其中包含一个包含欧元符号“price_€”的列名.Python将该列视为price_ 。它不允许我使用€或
File "<ipython-input-53-d7f8249147e7>", line 1
df[price_€] = df[0].str.replace(r'[€,]', '').astype('float')
^
SyntaxError: invalid syntax
有任何想法如何从列名中删除它,以便我可以开始引用它吗?
答案 0 :(得分:2)
您不能在变量名中使用欧元符号:
Identifiers (also referred to as names) are described by the following lexical definitions:
identifier ::= (letter|"_") (letter | digit | "_")*
letter ::= lowercase | uppercase
lowercase ::= "a"..."z"
uppercase ::= "A"..."Z"
digit ::= "0"..."9"
您需要使用字符串:
df["price_€"] ...
对于我来说,pandas对欧元符号实际上没有问题:
import pandas as pd
df = pd.DataFrame([[1, 2]], columns=["£", "€"])
print(df["€"])
print(df["£"])
0 2
Name: €, dtype: int64
0 1
Name: £, dtype: int64
该文件是cp1252编码的,因此您需要指定编码:
mport pandas as pd
iimport codecs
df = pd.read_csv("PPR-2015.csv",header=0,encoding="cp1252")
print(df.columns)
Index([u'Date of Sale (dd/mm/yyyy)', u'Address', u'Postal Code', u'County',
u'Price (€)', u'Not Full Market Price', u'VAT Exclusive', u'Description of Property', u'Property Size Description'], dtype='object')
print(df[u'Price (€)'])
0 €138,000.00
1 €270,000.00
2 €67,000.00
3 €900,000.00
4 €176,000.00
5 €155,000.00
6 €100,000.00
7 €120,000.00
8 €470,000.00
9 €140,000.00
10 €592,000.00
11 €85,000.00
12 €422,500.00
13 €225,000.00
14 €55,000.00
...
17433 €262,000.00
17434 €155,000.00
17435 €750,000.00
17436 €96,291.69
17437 €112,000.00
17438 €350,000.00
17439 €190,000.00
17440 €25,000.00
17441 €100,000.00
17442 €75,000.00
17443 €46,000.00
17444 €175,000.00
17445 €48,500.00
17446 €150,000.00
17447 €400,000.00
Name: Price (€), Length: 17448, dtype: object
然后改为浮动:
df[u'Price (€)'] = df[u'Price (€)'].str.replace(ur'[€,]'), '').astype('float')
print(df['Price (€)'.decode("utf-8")])
输出:
0 138000
1 270000
2 67000
3 900000
4 176000
5 155000
6 100000
7 120000
8 470000
9 140000
10 592000
11 85000
12 422500
13 225000
14 55000
...
17433 262000.00
17434 155000.00
17435 750000.00
17436 96291.69
17437 112000.00
17438 350000.00
17439 190000.00
17440 25000.00
17441 100000.00
17442 75000.00
17443 46000.00
17444 175000.00
17445 48500.00
17446 150000.00
17447 400000.00
Name: Price (€), Length: 17448, dtype: float64
答案 1 :(得分:1)
您在字符串
上使用如下的lambda过滤器import string
s = "some\x00string. with\x15 funny characters"
filter(lambda x: x in string.printable, s)
输出
'somestring. with funny characters'
答案 2 :(得分:0)
您应该使用rename
重命名列名:
In [189]:
df = pd.DataFrame(columns = ['price_€'])
df
Out[189]:
Empty DataFrame
Columns: [price_€]
Index: []
In [191]:
df.rename(columns = {'price_€':'price'},inplace=True)
df
Out[191]:
Empty DataFrame
Columns: [price]
Index: []
同样df[price_€]
是选择列的无效方式,您需要传递一个字符串,以便df['price_€']
是正确的格式。
还有:
df[0].str.replace(r'[€,]', '').astype('float')
目前还不清楚你在这里尝试了什么,df[0]
将再次引发一个KeyError
因为索引你需要传递一个字符串的列。
我也不明白你为什么把这个专栏投射到浮子上,你没有解释这一部分。