将所有城市名称更改为一个通用名称。
City b c
0 New york 1 1
1 New York 2 2
2 N.Y. 3 3
3 NY 4 4
他们称其为纽约市,但是python将它们视为独立实体,因此我将它们全部更改为一个。
df["City"] = df["City"].replace({"N.Y.":"New york", "New York": "New york", "NY": "New york"})
在此之后,我需要检查是否涵盖了纽约的所有变体,为此我创建了一个函数
def universal_ok(universal_name):
count = 0
for c in df.City:
if c == universal_name:
count += 1
# This only works when column consists of only one type of city
if count == len(df.City):
return "Yes all names are formatted correctly"
else:
return f"there are {len(df.City) - count} names that need to be changed"
universal_ok("New york")
但是问题是,在一列中有不止一个城市
City b c
0 New york 1 1
1 New York 2 2
2 N.Y. 3 3
3 NY 4 4
4 Toronto 3 2
5 TO 3 2
6 toronto 3 2
是否可以将每个城市更改为通用名称?
答案 0 :(得分:1)
City b c
New york 1 1
New York 2 2
N.Y. 3 3
NY 4 4
Toronto 3 2
TO 3 2
toronto 3 2
pandas.Series.str.lower
df.City = df.City.str.lower()
City b c
new york 1 1
new york 2 2
n.y. 3 3
ny 4 4
toronto 3 2
to 3 2
toronto 3 2
pandas.Series.unique
df.City.unique()
array(['new york', 'n.y.', 'ny', 'toronto', 'to'], dtype=object)
pandas.Series.map
cities_tup = (('New York', ['ny', 'n.y.', 'new york']),
('Toronto', ['toronto', 'to']))
cities_map = {y:x[0] for x in cities_tup for y in x[1]}
{'ny': 'New York',
'n.y.': 'New York',
'new york': 'New York',
'toronto': 'Toronto',
'to': 'Toronto'}
df.City = df.City.map(cities_map)
City b c
New York 1 1
New York 2 2
New York 3 3
New York 4 4
Toronto 3 2
Toronto 3 2
Toronto 3 2
pandas.Series.value_counts
df.City.value_counts()
New York 4
Toronto 3
Name: City, dtype: int64
答案 1 :(得分:0)
您需要具有特定城市ID的特定列,否则您将无法区分法国巴黎和德克萨斯州巴黎,也无法将伊斯坦布尔和君士坦丁堡分组。