在我的python代码中,我查询了mongodb。但是,数据库的名称包含连字符" - "。我该如何访问它?
from pymongo import MongoClient
import sys
client = MongoClient()
db = client.customer-care
cursor = db.interactions.find()
for document in cursor:
sys.stdout=open("test.txt","w")
print(document)
sys.stdout.close()
但是这段代码给了我以下错误:
Traceback (most recent call last):
File "test.py", line 6, in <module>
db = client.customer-care
NameError: name 'care' is not defined
答案 0 :(得分:2)
答案 1 :(得分:1)
以下是使用MongoDB的示例:
try:
from pymongo import MongoClient
conn = MongoClient("some host", port)
db = conn.get_database('rest-mongo-practice')
db.authenticate('data', 'data123')
brooklyn_rests = db.get_collection('restaurants').find({"borough": "Brooklyn"}).limit(10)
for rest in brooklyn_rests:
print(rest)
conn.close()
except IOError:
print("Error: cannot connect to rest-mongo-practice")
答案 2 :(得分:0)
MongoDB Queries
Structure:
db.collection.find(query, projection)
● MongoDB is queried using Javascript.
● The find() method is equivalent to the sql SELECT
● The find() method receives multiple arguments,
comparable to sql SELECT and WHERE clauses
Simple find() examples:
● db.restaurants.find()
● db.restaurants.find({"restaurant_id" : "30075445" })
● db.restaurants.find({"_id":
ObjectId("593fdedd378594b41f771794")})
● db.restaurants.find({"cuisine": {$in: ["Bakery","Chinese"]}})
MongoDB provides a few dozens of operators. Some query
operators examples:
● Comparison Operators $eq, $gt, $gte, $lt, $lte, $ne, $in, $nin
● Logical Operators $and, $or, $not, $nor
● Element Operators $type, $exists
● Evaluation Operators $where, $text
Find the restaurant named “Mrs. Maxwell'S Bakery”:
db.restaurants.find({‘name’ : “Mrs. Maxwell'S Bakery”})
Find all the students that pass the test:
db.students.find( {"grade": { $gt : 60 }})
Find all the bakeries in Brooklyn:
db.restaurants.find( {$and: [
{"cuisine" : "Bakery"} ,
{"borough" : "Brooklyn"}
]
})
Example: Find all restaurants in zip code 10462:
db.restaurants.find({"address.zipcode" : "10462"})
Example: Find all restaurants that got a C grade:
db.restaurants.find({"grades.grade" : "C"})
* Note that ‘grades’ is an array
The Projection parameter is used to specify the fields in the
documents of the result set. (Equivalent to sql SELECT clause)
● To specify fields, set their value to 1 in the projection
object.
● If a projection object is not specified, all fields are
returned.
● The _id field is returned by default unless set to 0
db.collection.find(query, projection)
Example: Find all the names of the restaurants in zip code
10462
db.restaurants.find({"address.zipcode" : "10462"}, {"name" : 1})
Example: Find the names of all restaurants
db.restaurants.find({"address.zipcode" : "10462"}, {"_id" : 0, "name" : 1})
In MongoDB, the returned value of a query is a database Cursor.
● A cursor is a pointer to the result set, stored in the database’s memory.
● The cursor object provides methods for handling the result set.
● MongoDB are iterable.
● Some cursor functions
○ count(), limit(), sort(), max(), min(), next(), hasNext(), skip(), toArray(),
forEach(function)
Example: limit the result of the previous query
db.restaurants.find({"address.zipcode":"10462"},{"_id":0,"name":1}).limit(3)
The sort() cursor function accepts an object that specifies the sort fields and
orientation.
Example: Sort the result of the previous query by name
db.restaurants.find({"address.zipcode":"10462"},{"_id":0,"name":1}).sort({"name":1})
The sort() cursor function accepts an object that specifies the sort fields and
orientation.
Example: Sort the result of the previous query by name in descending order
db.restaurants.find({"address.zipcode":"10462"},{"_id":0,"name":1}).sort({"name":-1})
Example: Sort all restaurants by zip code and name
db.restaurants.find({},{"_id":0,"name":1,"address.zipcode": 1})
.sort({"address.zipcode": 1, "name": 1 })
The count() cursor function returns the number of records in the result set.
Example: How many restaurants are in the restaurant's collection?
db.restaurants.find().count()
Example: How many restaurants are in the brooklyn?
db.restaurants.find({"borough" : "Brooklyn"}).count()
Reminder I: MongoDB shell is based on Javascript.
Reminder II: The cursor object is iterable.
This means that we can do:
var cursor = db.restaurants.find(); // pointer to the result set
while(cursor.hasNext()){ // Iterate over the result set
var document = cursor.next(); // Treat the documents as JS objects
var restName = document['name'];
print(restName);
}
The MongoDB cursor object also supports the javascript forEach function.
● In javascript, forEach is used to iterate over an array and apply a function
on each element.
● Javascript Example:
var arr = [1, 2, 3, 4]; //just an array of numbers
var printDouble = function(num){ // save the function as a variable
print(num * 2);
};
arr.forEach(printDouble); // apply printDouble for each element in arr
MongoDB shell Example:
db.restaurants.find().forEach(function(document){
print(document.name + "," +
document.cuisine + "," +
document.borough);
});
MongoDB has 2 approaches to aggregations
● Aggregation Pipeline - using the built in aggregate() function (and others)
● MapReduce - defining two functions, map() and reduce(), usually to handle
larger data sets.
Syntax:
db.myCollection.aggregate([{<stage1>}, {<stage2>}...])
Example:
db.orders.aggregate([
{$match : {“status: “A”} },
{$group : {_id: “$cust_id, total: {$sum : “$amount”}
])
$match - Filters documents by specified condition(s) to be passed to the next pipeline stage.
● $group - Groups documents by some expression and output one document per group.
● $limit - Limits the number of documents passed to the next stage in the pipeline.
● $skip - Skips over N documents before the passing the documents to the next pipeline stage.
● $sort - Sorts all input documents and returns them to the pipeline in sorted order.
● $unwind - Deconstructs an array field from the documents. Outputs a document per element.
● $count - Output document contains a count of the number of documents input to the stage.
● $project - Passes the documents with specified, or computed fields to the next pipeline
stage.
db.articles.aggregate( [ {$match: { author : "dave" }} ] );
db.collection.aggregate( [ { $group: { _id: <expression>,
<field1>: { <accumulator1> : <expression1> }, ... }
}] );
● The _id field is mandatory. If _id is null, all documents will be aggregated
into a single output document.
● The <accumulator> operator can perform various operators such as $sum,
$avg, $first, $last, $max, $min, $push, $addToSet.
db.sales.aggregate( [ { $group : { _id : "$item" } } ] );
db.sales.aggregate( [{$group: { _id: "$item" , total_sold: {$sum: "$quantity" }}}]);
db.sales.aggregate([{$match: {"date": {$gt: new Date("2014-04-01")}}},
{$group: { _id: "$item", total_sold: {$sum: "$quantity"}}}]);
db.sales.aggregate([{ $sort : { price : -1}},
{ $limit : 3 },
{ $skip : 1 }]);
MongoDB also provides the count() and distinct() functions.
db.collection.count()
db.collection.distinct(field)
db.orders.distinct(“cust_id”)
db.orders.count({<query>},{projection}) is equivalent to
db.orders.find({<query>},{projection}).count()
MongoDB handles CRUD operations on documents using insert(), find(),
update() and remove().
* Collections can be implicitly created by inserting a document.
CRUD Insert
Example:
db.mycol.insert({
_id: ObjectId(7df78ad8902c),
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by: 'tutorials point',
url: 'http://www.tutorialspoint.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100
})
* if an _id is not set, MongoDB will generate an ObjectId.
CRUD Update
Syntax:
db.collection.update(<query>, <update>, <options)
Some Update operators:
● $set / $unset Sets/Removes the value of a field in a document.
● $inc / $mul Increments/Multiplies the value of the field by the specified amount.
● $pull / $push Adds/Removes an item to an array.
db.tutorials.update({'title':'MongoDB Overview'},
{$set:{'title':'New MongoDB Tutorial'}})
db.sales.update({'_id': 1},
{$inc:{'price': 5, 'quantity' : 1 }})
Update options:
db.collection.update(
<query>,
<update>,
{
upsert: <boolean>, // create a document if non found
multi: <boolean>, // update all documents that match the query
writeConcern: <document>, // set the level of write concern
collation: <document> //specify language-specific rules for string comparison
}
)
Example: db.sales.update({'item': 'abc'}, {$inc:{'price': 5}, {multi: true} })
CRUD remove
Syntax:
db.collection.remove({<query>})
● Remove is will delete by default all the documents that match the query.
● If a query object is not specified, all the documents will be deleted.
(e.g. SQL TRUNCATE)
var result = db.users.findOne({"name":"Tom Benzamin"},{"address_ids": 1})
var addresses = db.address.find({"_id":{"$in": result["address_ids"]}})
答案 3 :(得分:-2)
"""1) read all the comments from the api and write them into a temporary file.
Read that file's content and concert it into a list of dictionaries.
insert those dictionaries into a new mongodb collection."""
import requests
from pymongo import MongoClient
r = requests.get("http://jsonplaceholder.typicode.com/posts")
comments = r.json()
client = MongoClient("ec2-52-91-68-76.compute-1.amazonaws.com:27017")
db = client['elevation']
for comment in comments:
db.my_comments.insert(comment)
client.close()
2) Read all Invoices from the chinook database. Pivot the records by creating a dictionary,
with Countries as keys and a list of invoices as their values
Print a list of countries and their amount of invoices they have.
import sqlite3
conn = sqlite3.connect("Chinook_Sqlite.sqlite")
c = conn.cursor()
c.execute("""select BillingCountry from invoice""")
all_rows = c.fetchall()
country_dict = {}
for row_tuple in all_rows:
country = row_tuple[0]
if country_dict.get(country) == None:
country_dict[country] = 0
country_dict[country] += 1
for key, val in country_dict.items():
print key, " - ", val
3) Find all the Donats restaurants from Manhattan in the MongoDB colletions. Print their names, restaurant_id and address (street and number) into a CSV file named donuts_in_manhatten.csv
from pymongo import MongoClient
client = MongoClient("___")
db = client['______']
cursor = db.restaurants.find({'cuisine': 'Donuts', 'borough':'Manhattan'})
rests = []
for rest in cursor:
rests.append(rest)
client.close()
csv_str = "name, restaurant_id, street, number\n"
for rest in rests:
csv_str += rest['name'] + ',' + rest['restaurant_id'] + ',' + rest['address']['street'] + ',' + rest['address']['building'] + "\n"
hello_file = open("donuts_in_manhatten.csv", 'w+')
hello_file.write(csv_str)
hello_file.close()
print the second element of ['a', 1, "foo", ['a', 'b', 3]]
my_list = ['a', 1, "foo", ['a', 'b', 3]]
print my_list[2]
print the second element of the fourth element of ['a', 1, "foo", ['a', 'b', 3]]
print my_list[3][1]
print the sum of all the numbers under 1000 that can be evenly divided by 13
my_sum = 0
for i in range(1000):
if i % 13 == 0:
my_sum += i
print my_sum
print the second highest number of [1, 4, 3, 21, 12, 8]
my_list = [1, 4, 3, 21, 12, 8]
my_list.sort()
print my_list[-2]
my_list.reverse()
print my_list[1]
print the age of people who were born in the following years [1984, 1982, 1990, 1985, 1992, 1991] """
years_of_birth = [1984, 1982, 1990, 1985, 1992, 1991]
ages = []
for year in years_of_birth:
ages.append(2017 - year)
print ages
write an infinite loop that adds elements into a list and see what happens :)
my_list = []
i = 0
while True:
my_list.append(i)
i += 1
Define two functions named 'foo' and 'bar'. make 'foo' call the 'bar' function and make 'bar' call 'foo'. Call the 'foo' function and see what happens
def foo():
bar()
def bar():
foo()
foo()
now wrap that code with error handling and print a custom error message
def foo():
bar()
def bar():
foo()
try:
foo()
except:
print "caught a stack overflow error"
print "continuing flow"
convert the following numeric grades into letters and print them for each student. A for > 90, B for 80-90, C for
70-80, D for 60-70 and F for < 60. The grades are Jhon = 87, Alice = 98, Bob = 55, Charlie = 45, Dave = 72
# my_dict = {'Jhon': 87, 'Alice': 98, 'Bob': 65, 'Charlie': 45, 'Dave': 72}
#
# for key, val in my_dict.items():
# if val >= 90:
# print key + ": A"
# elif val >= 80:
# print key + ": B"
# elif val >= 70:
# print key + ": C"
# elif val >= 60:
# print key + ": D"
# else:
# print key + ": F"
read the file named rest.json and print the borough
# from json import loads
# rest_file = open("rest.json", 'r')
# rest_file_content = rest_file.read()
# rest_file_json = loads(rest_file_content)
# print rest_file_json['borough']
# rest_file.close()
Write 'hello file! into a file named hello.txt
# hello_file = open("hello.txt", 'w+')
# hello_file.write("hello file!")
# hello_file.close()
send a GET http request to http://jsonplaceholder.typicode.com/posts and print its content:
# import requests
# r = requests.get("http://jsonplaceholder.typicode.com/posts")
# print r.text
send the same request and convert the content into a json. print the id of the last object in the response:
# import requests
# r = requests.get("http://jsonplaceholder.typicode.com/posts")
# response = r.json()
# response_as_list = list(response)
# print response_as_list[-1]['id']
connect to the chinook sqlite db. List all genres and their track count:
# import sqlite3
# conn = sqlite3.connect("Chinook_Sqlite.sqlite")
# c = conn.cursor()
# c.execute("""select genre.genreId, count(track.trackId)
# from track, genre
# where track.genreId = genre.genreId
# group by genre.genreId""")
# all_rows = c.fetchall()
# for row in all_rows:
# genre_id = row[0]
# track_count = row[1]
# print genre_id, "-", track_count
#### with a dictionary example ####
#
# import sqlite3
# def dict_factory(cursor, row):
# d = {}
# for idx, col in enumerate(cursor.description):
# d[col[0]] = row[idx]
# return d
#
# conn = sqlite3.connect("Chinook_Sqlite.sqlite")
# conn.row_factory = dict_factory
# c = conn.cursor()
# c.execute("""select genre.genreId, count(track.trackId) as total
# from track, genre
# where track.genreId = genre.genreId
# group by genre.genreId""")
# all_rows = c.fetchall()
# all_rows = all_rows
# for row_dict in all_rows:
# print 'GenreId - ', row_dict['GenreId'], " total: ", row_dict['total']
**/*Find queries**
// find all restaurants in the Brooklyn borough
db.restaurants.find( {'borough' : 'Brooklyn'})
// find the restaurant with a restaurant_id of "40357217"
db.restaurants.find( {restaurant_id : "40357217"})
// find all Donuts shops in the Brooklyn borough
db.restaurants.find( {$and : [ {'cuisine' : 'Donuts'}, {'borough' : 'Brooklyn'}]})
// find all restaurants on 11 Avenue
db.restaurants.find( {'address.street' : '11 Avenue'})
// find all restaurants with a restaurant_id greater than "50000000"
db.restaurants.find( {restaurant_id : { $gt : "50000000" }})
// find all restaurants in zipcodes smaller than "10500"
db.getCollection('restaurants').find({ 'address.zipcode' : {$lt : "10500"}})
// find all restaurant in Manhatten from the following cuisines: Italian, American, Seafood, Continental, Bakery.
db.restaurants.find( {borough : "Manhattan", cuisine : {$in : ['Italian', 'American', 'Seafood', 'Continental', 'Bakery']}} )
* Simple Aggregations
// Count all restaurants in Manhattan
db.restaurants.count( {borough : "Manhattan"} ) / db.restaurants.find( {borough : "Manhattan"} ).count()
// find the first restaurant on West 57 Street
db.restaurants.find( {"address.street" : "West 57 Street"} ).sort({"address.building" : 1}).limit(1)
// find a distinct list of cuisines
db.restaurants.distinct("cuisine")
/*
*
* Aggregation Pipeline
*
*/
// match all restaurants in Queens
db.restaurants.aggregate({$match : {borough : 'Queens'}})
// group and count all restaurants in Queens by cuisine
db.restaurants.aggregate({$match : {borough : 'Queens'}}, {$group : {_id : "$cuisine", total : { $sum : 1} }})
// group all restaurants by zipcodes and it's list of cuisines
// result document example:
// {
// "_id" : "10123",
// "cuisines" : [
// "Juice, Smoothies, Fruit Salads"
// ]
// }
db.restaurants.aggregate({$group : {_id : "$address.zipcode", cuisines : { $addToSet : "$cuisine"} }})
// find the top 10 popular cuisines in zipcode "11369"
db.restaurants.aggregate({$match : {"address.zipcode" : "11369"}},
{$group : {_id : "$cuisine", total : { $sum : 1} }},
{$sort : {total : -1}},
{$limit : 10} )
// list all grades and a list of restaurants in Manhatten that got that grade. (hint: use the $unwind stage)
db.restaurants.aggregate( {$match : {borough : "Manhattan"}},
{$unwind : '$grades'},
{$group : {_id : "$grades.grade", cuisines : { $addToSet : "$cuisine"} }})
/*
*
* Crud operations
*
*/
// delete the restaurant with the restaurant_id of "40363630"
db.restaurants.remove({restaurant_id : "40363630"})
// update the name of "Wild Asia" restaurant to be "Wild Wild Asia"
db.restaurants.update({"name" : "Wild Asia"}, {$set : {"name" : "Wild Wild Asia"}})
// set the cuisine of *all* the restaurants in Brooklyn to be "Brooklyn Style"
db.restaurants.update({borough : "Brooklyn"}, {$set : {cuisine: "Brooklyn Style"} }, {multi : true})
-- add a new artist named 'mashina', and their album 'Mashina 2' and their track called 'send me an angel' ( gegnre is Rock, media type MPEG audio file, the song is 4:10 minutes long and it is free, rest of the fields are null)
insert into artist values('276', 'Mashina');
insert into album values('348', 'Mahina 2', '276');
insert into track values('3504', 'send me an angel', 1,1,1,null,(4* 60 + 10) * 1000, null, 0);
select * from track where trackId = 3504;
-- fire all the it staff
delete from employee where title = 'IT Staff';
-- delete all tracks and albums by Green Day (if you fail, explain why)
delete
from track
where albumId in (select albumId
from album
where artistId in (select artistId
from artist
where name = 'Green Day'));
delete
from album
where artistId in (select artistId
from artist
where name = 'Green Day');
-- double the quantity of in orderslines of rock tracks.
update invoiceLine
set quantity = quantity * 2
where trackId in (select trackId
from track, genre
where track.genreId = genre.genreId
and genre.Name = 'Rock')
-- set all customer's company column to be 'Acme' if it is not set.
update customer
set company = 'Acme'
where company is null
-- a) create a temporary table of top 10 popular genres (genreId, genreName, total_tracks)
create temporary table top_genres as select genre.genreId, genre.name, count(*) as total_tracks
from track, genre
where track.genreId = genre.genreId
group by genre.genreId, genre.name
order by total_tracks desc
limit 10;
select * from top_genres;
-- b) use that table to query 100 top sold tracks from the top 10 popular genres.
select track.trackId, track.name, sum(invoiceLine.quantity) as total_sold
from invoiceLine, track
where invoiceLine.trackId = track.trackId
and track.genreId in (select genreId from top_genres)
group by track.trackId
order by total_sold desc
limit 100
-- create a view that represents the eployees sales by genre (employeeId, genre, sold)
create view employee_genre_sales as
select employee.employeeId, genre.genreId, sum(invoiceLine.quantity) as sold
from employee, customer, invoice, invoiceLine, track, genre
where employee.employeeId = customer.supportRepId
and invoice.customerId = customer.customerId
and invoice.invoiceId = invoiceLine.invoiceId
and invoiceLine.trackId = track.trackId
and track.genreId = genre.genreId
group by employee.employeeId, genre.genreId;
-- use that view to find the how many items did each employee sell from their favorite genre?
select employeeId, max(sold) as max_sale
from employee_genre_sales
group by employeeId;
-- What is the median time for a user to post since joining the app in 2015
-- step 1: find the first post time of each user
create temp table first_post as
select user_id, min(create_date) as first
from posts
group by user_id;
-- step 2: calc the time diff between user creation and first post (not sure whats julianday? google it :) )
create temp table time_to_first as
select users.id, (julianday(first_post.first) - julianday(users.create_date)) as days_to_first_post
from users, first_post
where users.id = first_post.user_id
and strftime('%Y' , users.create_date) = "2015"
order by days_to_first_post;
-- step 3: calc the median time in days
select round(days_to_first_post)
from time_to_first
order by days_to_first_post
limit 1
offset (select count(*) / 2 from time_to_first)
/*/
-- What was the DAU/MAU of visitors in May 2016?
-- step 1: calcualte the MAU
create temp table MAU as
select count(distinct(user_id))
from user_visits
where time between datetime('2016-05-01') and datetime('2016-05-31')
-- step 2: calcuate the DAU
create temp table dau_mau as
select strftime('%d', time) as day, count(distinct(user_id)) * 1.0 / (select * from MAU) as DAU_MAU
from user_visits
where time between datetime('2016-05-01') and datetime('2016-05-31')
group by day
-- step 3: present value as percentage
select day, round(DAU_MAU, 2) * 100 as dau_mau
from dau_mau
--
-- What was the median life span of users that were not seen in the last 6 months and joined in 2015?
-- step 1: users that joined in 2015 and not visited in 6 months
drop table churned_2015;
create temp table churned_2015 as
select *
from users, user_visits
where users.id = user_visits.user_id
and strftime('%Y' , create_date) = "2015"
and users.id not in (select distinct user_id from user_visits where time > date('now', '-6 month'));
-- step 2: what was the lifetime of each user in that segment
create temp table churned_2015_lifetime as
select users.id, (strftime('%s', last_seen) - strftime('%s', create_date)) / (60 * 60 * 24) as lifetime
from users, (select user_id , max(time) as last_seen
from churned_2015
group by user_id) as churned_last_seen
where users.id = churned_last_seen.user_id
order by lifetime;
-- step 3: the median lifetime (skip half the records)
select lifetime as median_lifetime
from churned_2015_lifetime
order by lifetime
limit 1
offset (select count(*) /2 from churned_2015_lifetime);
/*/
-- What is the likelihood for a user to post a second, third, forth and fifth post?
-- step 1: users post count
create temp table user_posts_cnt as
select user_id, count(*) total_user_posts
from posts
group by user_id
--
-- step 2: how many users posted more than X times
create temp table users_posts_stopping_point as
select
(select count(distinct(user_id)) from posts) as total_users_posted,
(select count(*) from user_posts_cnt where total_user_posts > 1) as posted_more_than_once,
(select count(*) from user_posts_cnt where total_user_posts > 2) as posted_more_than_twice,
(select count(*) from user_posts_cnt where total_user_posts > 3) as posted_more_than_thrice,
(select count(*) from user_posts_cnt where total_user_posts > 4) as posted_more_than_quad;
-- step 3: calculate the conversion ratio beyween each next post
select
posted_more_than_once * 1.0 / total_users_posted as second_post_likelihood,
posted_more_than_twice * 1.0 / total_users_posted as third_post_likelihood,
posted_more_than_thrice * 1.0 / total_users_posted as forth_post_likelihood,
posted_more_than_quad * 1.0 / total_users_posted as fifth_post_likelihood
from users_posts_stopping_point
use soccer;
-- List all players and their properties
SELECT *
FROM Player;
-- List all Players names and birthdays
SELECT player_name, birthday
FROM Player;
-- list all team names and their short names
select team_long_name, team_short_name
from team;
-- list all players that are over 180 cm in height.
SELECT *
FROM Player
WHERE height > 180;
-- how many players are above 180cm in height?
SELECT count(*)
FROM Player
WHERE height > 180;
-- what is the average height of players?
SELECT AVG(height)
FROM Player;
-- what is the average height of players that are under 25 years old?
SELECT AVG(weight) as avg_weight
FROM Player
where cast(strftime(birthday) as integer) < strftime('%Y',date('now')) - 25;
-- what is the average BMI of players?
select AVG(weight * 0.453 / ((height / 100)*(height / 100))) as BMI
from Player;
-- list the youngest 10 players
SELECT player_name, birthday
FROM Player
ORDER BY birthday DESC
LIMIT 10;
-- list all the games
Select * from match;
-- list all the distinct game seasons
select distinct(season)
from match;
-- how many games ended with a goal gap of over 3?
select count(*)
from match
where ABS(home_team_goal - away_team_goal) > 3;
-- what is the maximun number of times two groups played against each other?
select home_team_api_id, away_team_api_id,count(*)
from match
group by home_team_api_id, away_team_api_id;
-- how many games ended in a draw?
select count(*)
from match
where home_team_goal = away_team_goal;
-- how many games had goals but ended in a draw?
select count(*)
from match
where home_team_goal = away_team_goal
and home_team_goal != 0;
-- what is the ratio between home team goals and away team goals
select sum(home_team_goal) / cast(sum(away_team_goal) as double)
from match ;
-- what month hosts the most games
select strftime('%m',date) as game_month,date, count(*)
from match
group by strftime('%m',date)
order by game_month desc
limit 1;
-- which leage had the most games in 2008
select league_id,count(*) as total
from match
where strftime('%Y',date) = '2008'
group by league_id
order by total desc
limit 1;
-- which season had the most goals
select season, sum(home_team_goal) + sum(away_team_goal) as total_goals
from match
group by season
order by total_goals desc
limit 1;
-- which season had the most goals per game?
select season, (sum(home_team_goal) + sum(away_team_goal) ) / count(*) as total_goals_per_game
from match
group by season
order by total_goals_per_game desc
limit 1;
-- what was the date of the first gmae in each season?
select season, min(date)
from match
group by season;
-- list all players names and categorize their height as 'short' (under 170) 'medium' (between 170 and 179) and 'high' (over 180)
select player_name, height,
case when height > 180 then 'high'
when height between 179 and 170 then 'medium'
else 'short' end as height_category
from player;
-- what is the avg number of goals in a game that didn't end in a draw?
select AVG(home_team_goal + away_team_goal)
from match
where home_team_goal != away_team_goal;
-- list all seasons with an avg goal per game rate of over 1, in games that didnt end with a draw
select season, ROUND((SUM(home_team_goal+away_team_goal) *1.0) /Count(id),3) as ratio
from match
where home_team_goal != away_team_goal
group by season
having ratio > 1