我正在尝试为(CSV数据解析器程序的)Geography类编写一个函数,该函数将Pandas系列(称为Cities)中列出的城市转换为City ID。
熊猫系列将类似于df['Cities'] = 'New York'
。
例如,纽约的city_id为1212。我从2系列中设置了一个包含城市和城市ID的字典,因此city_dict的内容类似于city_dict = {New York:1212}
。
然后,我尝试将city_dict的键与Cities中的城市匹配,然后用city_id替换Cities中的城市。因此,在city_dict中,如果键“ New York”在城市中与字符串“ New York”匹配,我将在城市中将“ New York”替换为“ 1212”(city_dict中的值也是一个字符串,但为数字)。
如何用city_dict中的值(城市的city_id)替换城市中的数据(例如城市名称)?
我认为,如果city_dict的键与“城市”(熊猫系列)中的数据匹配,我可以简单地输入“ if in”语句。由于dict_keys无法哈希,因此无法执行此操作。另外,replace()
(如用city_dicts值替换系列中的数据一样)将不起作用,因为Python中的字典不适用于replace()
。
这是我功能的基础:
def coding(file, file2):
list1 = []
list2 = []
df = pd.read_csv(file, names=columns, header=None, encoding='utf-8')
df2 = pd.read_csv(file2, names=codes_columns,header=None, encoding='utf-8')
dfCity = df['Cities']
df2id = df2['city_id']
df2city = df2['City']
df2id = (df2id)
df2city = (df2city) # same as above
city_dict = dict(zip(df2city,df2id))
keys = city_dict.keys()
keyss = str(list(keys))
strnewdict = str(newdict)
strkeys = str(keys)
strdf2id = str(df2id)
strvalue = str(city_dict.values())
if dfCity.str.contains('UNKNOWN').any():
dfCity = dfCity.replace('UNKNOWN','')
for _ in range(1):
if dfCity.str.contains(keyss).any():
dfCity.replace(keyss,df2id)```
我想到了将city_dict包裹在str周围,然后使用
if Cities.str.contains(keyss).any():
(Keyss being the stringfied keys of city_dict)
But that wasn't useful either
The ultimate goal is to replace the cities with their city_id's and append them to a list, and overwrite the dfCity Series in a new file
答案 0 :(得分:0)
从字典到熊猫:
import pandas as pd
df = pd.DataFrame.from_dict(YOURDICT, orient='index')