我有2个csv文件。第一个是数据文件,另一个是映射文件。 Mapping文件有4列:Device_Name
,GDN
,Device_Type
和Device_OS
。
数据文件中存在相同的列。
数据文件包含填充了Device_Name
列且其他三列为空白的数据。所有四列都填充在Mapping文件中。我希望我的Python代码打开文件和数据文件中的每个Device_Name
,从映射文件中映射其GDN
,Device_Type
和Device_OS
值。
我知道当只有2列存在时如何使用dict(需要映射1个)但我不知道如何在需要映射3列时完成此操作。
以下是我尝试完成Device_Type
:
x = dict([])
with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:
file_map = csv.reader(in_file1, delimiter=',')
for row in file_map:
typemap = [row[0],row[2]]
x.append(typemap)
with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:
writer = csv.writer(out_file, delimiter=',')
for row in csv.reader(in_file2, delimiter=','):
try:
row[27] = x[row[11]]
except KeyError:
row[27] = ""
writer.writerow(row)
返回Atribute Error
。
经过一番研究,我意识到我需要创建一个嵌套的dict,但我不知道如何做到这一点。 请帮我解决这个问题,或者按照正确的方向推动我解决这个问题。
答案 0 :(得分:248)
嵌套dict是字典中的字典。一件非常简单的事情。
>>> d = {}
>>> d['dict1'] = {}
>>> d['dict1']['innerkey'] = 'value'
>>> d
{'dict1': {'innerkey': 'value'}}
您还可以使用defaultdict
包中的collections
来帮助创建嵌套词典。
>>> import collections
>>> d = collections.defaultdict(dict)
>>> d['dict1']['innerkey'] = 'value'
>>> d # currently a defaultdict type
defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}})
>>> dict(d) # but is exactly like a normal dictionary.
{'dict1': {'innerkey': 'value'}}
您可以随意填充。
我建议您在代码中喜欢以下内容:
d = {} # can use defaultdict(dict) instead
for row in file_map:
# derive row key from something
# when using defaultdict, we can skip the next step creating a dictionary on row_key
d[row_key] = {}
for idx, col in enumerate(row):
d[row_key][idx] = col
根据您的comment:
可能是上面的代码混淆了这个问题。我的问题简而言之:我 有2个文件a.csv b.csv,a.csv有4列i j k l,b.csv也有 这些专栏。我是这些csvs的关键专栏。 j k l专栏 在a.csv中为空,但在b.csv中填充。我想映射j k的值 l使用'i`作为b.csv到a.csv文件的关键列的列
我的建议是喜欢这个(不使用defaultdict):
a_file = "path/to/a.csv"
b_file = "path/to/b.csv"
# read from file a.csv
with open(a_file) as f:
# skip headers
f.next()
# get first colum as keys
keys = (line.split(',')[0] for line in f)
# create empty dictionary:
d = {}
# read from file b.csv
with open(b_file) as f:
# gather headers except first key header
headers = f.next().split(',')[1:]
# iterate lines
for line in f:
# gather the colums
cols = line.strip().split(',')
# check to make sure this key should be mapped.
if cols[0] not in keys:
continue
# add key to dict
d[cols[0]] = dict(
# inner keys are the header names, values are columns
(headers[idx], v) for idx, v in enumerate(cols[1:]))
请注意,对于解析csv文件,有一个csv module。
答案 1 :(得分:56)
更新:对于任意长度的嵌套词典,请转到this answer。
使用集合中的defaultdict函数。
高性能:“如果密钥不在dict中”,当数据集很大时非常昂贵。
低维护:使代码更易读,并且可以轻松扩展。
from collections import defaultdict
target_dict = defaultdict(dict)
target_dict[key1][key2] = val
答案 2 :(得分:18)
对于任意级别的嵌套:
In [2]: def nested_dict():
...: return collections.defaultdict(nested_dict)
...:
In [3]: a = nested_dict()
In [4]: a
Out[4]: defaultdict(<function __main__.nested_dict>, {})
In [5]: a['a']['b']['c'] = 1
In [6]: a
Out[6]:
defaultdict(<function __main__.nested_dict>,
{'a': defaultdict(<function __main__.nested_dict>,
{'b': defaultdict(<function __main__.nested_dict>,
{'c': 1})})})
答案 3 :(得分:0)
重要的是要记住在使用defaultdict和类似嵌套的dict模块(如nested_dict)时,查找不存在的密钥可能会无意中在dict中创建一个新的密钥条目并导致很多破坏。这是一个带有nested_dict的Python3示例。
import nested_dict as nd
nest = nd.nested_dict()
nest['outer1']['inner1'] = 'v11'
nest['outer1']['inner2'] = 'v12'
print('original nested dict: \n', nest)
try:
nest['outer1']['wrong_key1']
except KeyError as e:
print('exception missing key', e)
print('nested dict after lookup with missing key. no exception raised:\n', nest)
# instead convert back to normal dict
nest_d = nest.to_dict(nest)
try:
print('converted to normal dict. Trying to lookup Wrong_key2')
nest_d['outer1']['wrong_key2']
except KeyError as e:
print('exception missing key', e)
else:
print(' no exception raised:\n')
# or use dict.keys to check if key in nested dict.
print('checking with dict.keys')
print(list(nest['outer1'].keys()))
if 'wrong_key3' in list(nest.keys()):
print('found wrong_key3')
else:
print(' did not find wrong_key3')
输出是:
original nested dict: {"outer1": {"inner2": "v12", "inner1": "v11"}}
nested dict after lookup with missing key. no exception raised:
{"outer1": {"wrong_key1": {}, "inner2": "v12", "inner1": "v11"}}
converted to normal dict.
Trying to lookup Wrong_key2
exception missing key 'wrong_key2'
checking with dict.keys
['wrong_key1', 'inner2', 'inner1']
did not find wrong_key3
答案 4 :(得分:0)
如果您要创建一个给定路径列表(任意长度)的嵌套字典,并对路径末尾可能存在的项目执行功能,此方便的小递归功能非常有用:
from pyspark.sql.functions import *
from pyspark.sql.types import *
data = ['15860461.48']
df = spark.createDataFrame(data, StringType())
df.show(truncate=False)
df2 = df.withColumn('value', col('value').cast('decimal(36, 12)'))
df2.show(truncate=False)
+-----------+
|value |
+-----------+
|15860461.48|
+-----------+
+---------------------+
|value |
+---------------------+
|15860461.480000000000|
+---------------------+
示例:
def ensure_path(data, path, default=None, default_func=lambda x: x):
"""
Function:
- Ensures a path exists within a nested dictionary
Requires:
- `data`:
- Type: dict
- What: A dictionary to check if the path exists
- `path`:
- Type: list of strs
- What: The path to check
Optional:
- `default`:
- Type: any
- What: The default item to add to a path that does not yet exist
- Default: None
- `default_func`:
- Type: function
- What: A single input function that takes in the current path item (or default) and adjusts it
- Default: `lambda x: x` # Returns the value in the dict or the default value if none was present
"""
if len(path)>1:
if path[0] not in data:
data[path[0]]={}
data[path[0]]=ensure_path(data=data[path[0]], path=path[1:], default=default, default_func=default_func)
else:
if path[0] not in data:
data[path[0]]=default
data[path[0]]=default_func(data[path[0]])
return data
答案 5 :(得分:0)
这个东西是空的嵌套列表,ne 会将数据附加到空的字典中
ls = [['a','a1','a2','a3'],['b','b1','b2','b3'],['c','c1','c2','c3'],
['d','d1','d2','d3']]
这意味着在 data_dict 中创建四个空字典
data_dict = {f'dict{i}':{} for i in range(4)}
for i in range(4):
upd_dict = {'val' : ls[i][0], 'val1' : ls[i][1],'val2' : ls[i][2],'val3' : ls[i][3]}
data_dict[f'dict{i}'].update(upd_dict)
print(data_dict)
输出
{'dict0': {'val': 'a', 'val1': 'a1', 'val2': 'a2', 'val3': 'a3'}, 'dict1': {'val': 'b', 'val1': 'b1', 'val2': 'b2', 'val3': 'b3'},'dict2': {'val':'c','val1':'c1','val2':'c2','val3':'c3'},'dict3':{'val':'d','val1' : 'd1', 'val2': 'd2', 'val3': 'd3'}}
答案 6 :(得分:0)
#in jupyter
import sys
!conda install -c conda-forge --yes --prefix {sys.prefix} nested_dict
import nested_dict as nd
d = nd.nested_dict()
'd' 现在可以用于存储嵌套的键值对。
答案 7 :(得分:0)
travel_log = {
"France" : {"cities_visited" : ["paris", "lille", "dijon"], "total_visits" : 10},
"india" : {"cities_visited" : ["Mumbai", "delhi", "surat",], "total_visits" : 12}
}
答案 8 :(得分:0)
dmin()
<body>
<div class="parent bg-gray-900 flex flex-col h-screen text-gray-100 w-screen">
<header class="bg-gray-800 p-4 flex border-b border-solid border-gray-600">
<span class="flex flex-1">
<a href="#" class="text-gray-100 flex">
Header 1
</a>
<span class="text-right flex-1 mr-4">
Header 2
</span>
</span>
</header>
<div class="main flex-1 flex">
<div class="nav-bar bg-gray-800 w-60 flex-none">navbar</div>
<div class="content flex-1">
<div class="flex w-full h-full flex-col">
<div class="header w-full flex bg-gray-700">
<div class="w-full ml-6 flex flex-row">
<a href="#" class="py-4 pl-4 pr-4 text-gray-100 hover:text-gray-300 hover:bg-gray-600" >
<div>
item title
</div>
</a>
<a href="#" class="py-4 pl-4 pr-4 text-gray-100 hover:text-gray-300 hover:bg-gray-600 border-solid border-blue-500 border-b-2 text-blue-500" >
<div>
item title
</div>
</a>
</div>
</div>
<div className="content w-full flex-1 flex">
<div class="flex-1 flex overflow-x-scroll">
<div class="align-middle border-b border-gray-200 flex flex-1" style="">
<table class="divide-y divide-gray-200 min-w-full" role="table">
<thead class="">
<tr role="row">
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">Col 1</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">Col2</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
<th class="px-6 py-3 bg-gray-50 text-left text-xs leading-4 font-medium text-gray-500 uppercase tracking-wider"
colspan="1" role="columnheader">ColXXX</th>
</tr>
</thead>
<tbody class="bg-white divide-y divide-gray-200" role="rowgroup">
<tr role="row">
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">XXX
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">
PERCENTAGE</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">10
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">0
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">0
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">10
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">10
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">10
</td>
<td class="px-6 py-4 whitespace-no-wrap text-sm leading-5 font-medium text-gray-900" role="cell">5
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
参考文献: