快速逻辑问题。
如果我有一个CSV文件,每行都有一个字典值(列为["位置"],["电影标题"],["日期& #34;]),对于我来说,将具有相同位置值的数据行的标题和日期值组合在一起的最佳方法是什么?
数据摘录:
Location Movie Title Date
Edgebrook Park, Chicago A League of Their Own 7-Jun
Edgebrook Park, Chicago It's a Mad, Mad, Mad, Mad World 9-Jun
对于具有相同位置的每一行(如本示例中的^),我想像这样输出,以便没有重复的位置。
Edgebrook Park, Chicago A League of Their Own 7-Jun It's a Mad, Mad, Mad, Mad World 9-Jun
最好的方法是什么?
更新 我必须稍微更改数据,所以现在我的列看起来像:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
"Gage Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Gage Park, Chicago, IL, USA",41.7954363,-87.6962257
"Jefferson Memorial Park, Chicago ",Jun-12 Monsters University ,"Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA",41.76083920000001,-87.6294353
"Commercial Club Playground, Chicago ",Jun-12 Despicable Me 2,"Chicago, IL, USA",41.8781136,-87.6297982
等等。我在这里看到了很多OrderedDict
或defaultdict
个建议,但是现在只扩展或追加'MovieDates'
列的最佳方法是什么,而不是整个行的其余部分作为'Location'
列键的值?
答案 0 :(得分:2)
不确定您计划对列进行的操作,但这会按位置对元素进行分组
from collections import OrderedDict
od = OrderedDict()
import csv
with open("in.csv") as f,open("new.csv" ,"w") as out:
r = csv.reader(f)
wr= csv.writer(out)
header = next(r)
for row in r:
loc,*rest = row
od.setdefault(loc, []).extend(rest)
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc]+vals)
推定输入:
Location Movie Title Date
"Edgebrook Park, Chicago","A League of Their Own",7-Jun
"Edgebrook Park, Chicago","It's a Mad, Mad, Mad, Mad World", 9-Jun
输出:
Location Movie Title Date
"Edgebrook Park, Chicago",A League of Their Own,7-Jun,"It's a Mad, Mad, Mad, Mad World",9-Jun
我认为你的csv文件实际上是正确的,并且列实际上是用逗号分隔的,如果没有,那么它会变得更加复杂。
如果您的格式实际上已发布,则必须自行拆分:
from collections import OrderedDict
od = OrderedDict()
import csv
import re
with open("in.csv") as f,open("new.csv", "w") as out:
header = next(f)
for line in f:
loc, rest = re.split("\s{2,}",line.rstrip(),1)
od.setdefault(loc, []).extend(rest)
out.write(header)
for loc, vals in od.items():
out.write("{} ".format(loc))
out.write(" ".join(vals))
输入:
Location Movie Title Date
Edgebrook Park, Chicago A League of Their Own 7-Jun
Edgebrook Park, Chicago It's a Mad, Mad, Mad, Mad World 9-Jun
输出:
Location Movie Title Date
Edgebrook Park, Chicago A League of Their Own 7-Jun It's a Mad, Mad, Mad, Mad World 9-Jun
如果您的格式有点搞砸了,我会借此机会尝试将其转换为更容易解析的格式。
对于python 2:
from collections import OrderedDict
od = OrderedDict()
import csv
with open("in.csv") as f,open("new.csv" ,"w") as out:
r = csv.reader(f)
wr= csv.writer(out)
header = next(r)
for row in r:
loc,rest = row[0], row[1:]
od.setdefault(loc, []).extend(rest)
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc]+vals)
答案 1 :(得分:1)
from collections import defaultdict
# rows containing your data
rows = ...
byLocation = defaultdict(list)
for row in rows:
byLocation[row[0]].append(row[1:])
答案 2 :(得分:0)
使用来自我的另一个answer的OrderedDefaultdict
可以轻松解决此问题(如下所示)。输出与每个剧院位置相关的值同样容易。
import collections
import csv
class OrderedDefaultdict(collections.OrderedDict):
def __init__(self, *args, **kwargs):
if not args:
self.default_factory = None
else:
if not (args[0] is None or callable(args[0])):
raise TypeError('first argument must be callable or None')
self.default_factory = args[0]
args = args[1:]
super(OrderedDefaultdict, self).__init__(*args, **kwargs)
def __missing__ (self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = default = self.default_factory()
return default
def __reduce__(self): # optional, for pickle support
args = (self.default_factory,) if self.default_factory else ()
return self.__class__, args, None, None, self.iteritems()
movies = OrderedDefaultdict(list)
with open('movies.csv', 'rb') as f:
csv_reader = csv.DictReader(f, delimiter='\t')
for row in csv_reader:
movies[row['Location']].append(' '.join([row['Movie Title'], row['Date']]))
import json # just to display dictionary created
print(json.dumps(movies, indent=4))
输出:
{
"Edgebrook Park, Chicago": [
"A League of Their Own 7-Jun",
"It's a Mad, Mad, Mad, Mad World 9-Jun"
]
}
答案 3 :(得分:-1)
请尝试以下代码:
from collections import defaultdict
import csv
ret = defaultdict([])
f = open("in.csv")
fread = csv.reader(f)
for r in fread:
ret[r[0]].append("{}, {} ".format(r[1], r[2]))
res = ["{} {}".format(k, "".join(ret[k])) for k in ret]
print res
f.close()