我有一个如下所示的数据
data = [(datetime.datetime(2021, 2, 6, 18, 48, 18, 97962), u'London', u'New York', u'UPLOAD_LOW'), (datetime.datetime(2021, 2, 6, 18, 48, 18, 97962), u'Berlin', u'Tokyo', u'DOWNLOAD_HIGH'), (datetime.datetime(2021, 2, 6, 18, 47, 8, 209495), u'Paris', u'Toronto', u'DROP_LOW')]
这是在pandas中加载时的样子
date source destination issue
0 2021-02-06 18:48:18.097962 London New York UPLOAD_LOW
1 2021-02-06 18:48:18.097962 Berlin Tokyo DOWNLOAD_HIGH
2 2021-02-06 18:47:08.209495 Paris Toronto DROP_LOW
现在我想将 issue
列中的值映射到字典。使用 pandas map
函数 pandas.Series.map 有助于完成工作。但问题是字典中要映射的键只包含列值的一部分。
下面是我的字典
issue_short_form_map = {
"UPLOAD": "UP",
"MEMORY": "MEM",
"DOWNLOAD": "DN"
}
现在说我想用上面的字典映射列 issue
。通常这就是我所做的
df = pd.DataFrame(data)
df.columns = ["date", "source", "destination", "issue"]
# map the values in issue column to the dictionary. Anything that doesn't match, keep the original
df["issue"] = df["issue"].map(issue_short_form_map).fillna(df["issue"])
但问题是 issue
列中的值与键不直接匹配,但只有一部分匹配(在拆分 _
并获取第一部分之后)。
有什么方法可以使对列值的 _
进行拆分并将其与字典进行映射可以工作吗?任何不匹配的都应保持原样。
我的最终输出应该如下所示
date source destination issue
0 2021-02-06 18:48:18.097962 London New York UP_LOW
1 2021-02-06 18:48:18.097962 Berlin Tokyo DN_HIGH
2 2021-02-06 18:47:08.209495 Paris Toronto DROP_LOW
答案 0 :(得分:4)
您可以先split
issue 列,仅使用映射转换第一部分,然后再添加剩余部分:
splits = df['issue'].str.split('_')
short_issue = splits.str[0].map(issue_short_form_map).fillna(splits.str[0])
df['issue'] = short_issue + '_' + splits.str[1]
df
# date source destination issue
#0 2021-02-06 18:48:18.097962 London New York UP_LOW
#1 2021-02-06 18:48:18.097962 Berlin Tokyo DN_HIGH
#2 2021-02-06 18:47:08.209495 Paris Toronto DROP_LOW
答案 1 :(得分:2)
list_to_change = [i for i in issue_short_form_map.keys()]
def check_replace(x):
"""
Check elements in x and compare to change
"""
for element_to_check in list_to_change:
if x.__contains__(element_to_check):
return x.replace(element_to_check,issue_short_form_map[element_to_check])
return x
df["issue"]= df["issue"].map(check_replace)
print(df)
date source destination issue
0 2021-02-06 18:48:18.097962 London New York UP_LOW
1 2021-02-06 18:48:18.097962 Berlin Tokyo DN_HIGH
2 2021-02-06 18:47:08.209495 Paris Toronto DROP_LOW
答案 2 :(得分:0)
如果未找到匹配项,字典方法 get 允许使用默认值。这解决了 drop 缺少键/值对匹配的情况。
data = [(datetime(2021, 2, 6, 18, 48, 18, 97962), u'London', u'New York', u'UPLOAD_LOW'), (datetime(2021, 2, 6, 18, 48, 18, 97962), u'Berlin', u'Tokyo', u'DOWNLOAD_HIGH'), (datetime(2021, 2, 6, 18, 47, 8, 209495), u'Paris', u'Toronto', u'DROP_LOW')]
df=pd.DataFrame(data,columns=['date','source','destination','issue']).reset_index().fillna(0)
#print(df)
issue_short_form_map = {
"UPLOAD": "UP",
"MEMORY": "MEM",
"DOWNLOAD": "DN"
}
mylist=df['issue'].apply(lambda row: row.split("_"))
mylist=[issue_short_form_map.get(x[0],x[0])+"_"+str(x[1]) for x in mylist]
df['issue']=mylist
print(df)
output:
index date source destination issue
0 0 2021-02-06 18:48:18.097962 London New York UP_LOW
1 1 2021-02-06 18:48:18.097962 Berlin Tokyo DN_HIGH
2 2 2021-02-06 18:47:08.209495 Paris Toronto DROP_LOW