我正在尝试从我创建的数据库的sqlite3列创建一个嵌套字典,该数据库基于我观看过的动画(长达数百个条目)。数据库中的两列是" DateWatched"这是我观看特定动画的日期(如6月6日 - 6月8日等),另一栏是#34;年"这是我看动漫的一年。
以下是两列中数据的一个小例子:
DateWatched | Year
---------------------------------+----------------
Dec 18-Dec 23 | 2013
Dec 25-Jan 10 | 2013 and 2014
Feb 2014 and Jan 1-Jan 3 2016 | 2014 and 2016 #Some anime get another season years later so any date after an "and" is another season
Mar 10th | 2014
Mar 13th | 2014
这是我的两列的基本结构。我想要做的是将它存储在字典或列表中,并记录我每个月(从1月到12月)每年观看的动画数量。
我想我希望它是这样的(基于我的例子):
Final = {'2013':{'Dec':2},
'2014':{'Jan':1, 'Feb':1,'Mar':2}
'2016':{'Jan':1}}
我想出了如何分别创建每列的列表:
MonthColumn = [i[0] for i in c.execute("SELECT DateWatched FROM Anime").fetchall()] #'Anime' is just the name of arbitrary name for the database
x = [item.replace('-',' ') for item in [y for x in MonthColumn for y in re.split(' and ', x)]] #Gets rid of '-' in each row and splits into two strings any place with an 'and'
v = [' '.join(OrderedDict((w,w) for w in item.split()).keys()) for item in x] # Removes duplicate words ("Dec 18-Dec 23" becomes "Dec 18 23")
j = [y for j in v for y in j.split()] #Splits into separate strings ("Dec 18 23" becomes "Dec", "18", "23")
Month = [item for item in j if item.isalpha()] #Final list and removes any string with numbers (So "Dec","18","23" becomes "Dec")
YearColumn = [i[0] for i in c.execute("SELECT Year FROM Anime").fetchall()]
Year = [item for Year in YearColumn for item in re.split(' and ', Year)] #Final list and removes any "and" and splits the string into 2 (So "2013 and 2014" becomes "2013","2014")
#So in the example columns I gave above, my final lists become
Month = ['Dec','Dec','Jan','Feb','Jan','Mar','Mar']
Year = ['2013','2013','2014','2014','2016','2014',2014']
最大的问题是我需要最多的帮助是试图弄清楚如何将两个列表转换为嵌套字典或类似的东西,并在Matplotlib中使用它来创建一个以年份为轴的条形图(带有每年12个条),y轴是在x轴上每年观察的月份动漫的数量。
感谢您的帮助,如果我遗漏了任何内容或者没有包含某些内容(第一次发帖),请抱歉。
答案 0 :(得分:0)
我建议使用稍微不同的解析方法来处理日常范围,需要将其考虑在内以实现所需的可视化字典,然后可以使用它来创建更清晰的图:
import re, sqlite3
import itertools, collections
data = list(sqlite3.connect('db_tablename.db').cursor().execute("SELECT DateWatched, Year FROM tablename"))
new_parsed = [[list(filter(lambda x:x != 'and', re.findall('[a-zA-Z]+', a))), re.findall('\d+', b)] for a, b in data]
new_results = [i for b in [list(zip(*i)) for i in new_parsed] for i in b]
groups = {a:collections.Counter([c for c, _ in b]) for a, b in itertools.groupby(sorted(new_results, key=lambda x:x[-1]), key=lambda x:x[-1])}
这会得到{'2013': Counter({'Dec': 2}), '2014': Counter({'Mar': 2, 'Jan': 1, 'Feb': 1}), '2016': Counter({'Jan': 1})}
的结果。
要图表:
import matplotlib.pyplot as plt
months = ['Dec', 'Jan', 'Feb', 'Mar']
new_months = {a:[[i, b.get(i, 0)] for i in months] for a, b in groups.items()}
labels = iter(['Dec', 'Jan', 'Feb', 'Mar'][::-1])
for i in range(len(new_months['2013'])):
i = len(new_months['2013'])-i-1
_current = [b[i][-1] for _, b in sorted(new_months.items(), key=lambda x:int(x[0]))]
_previous = [sum(c[-1] for c in b[:-i]) for _, b in sorted(new_months.items(), key=lambda x:int(x[0]))]
if not all(_previous):
plt.bar(range(len(new_months)), _current, label = next(labels))
else:
plt.bar(range(len(new_months)), _current, label = next(labels), bottom = _previous)
plt.xticks(range(len(new_months)), sorted(new_months, key=lambda x:int(x)))
plt.legend(loc='upper left')
plt.show()