CSE-3421
Introduction to Database Systems

York University
Winter 2009
Project #3: SQL Jeopardy
Querying the York River Bookseller Database
  Project

In this project, you will work with an existing database, the York River Bookseller's Database (YRB DB). Actually, each will install his or her own private copy of the database on which to work. For SQL Jeopardy, you are to implement a number of SQL queries over the YRB DB. The project is 10% of the total grade.

There are ten queries that you are to write. Each is worth one point for a total of ten points for Project #3. I am the manager and I provide you with English specifications for the queries that I want answered from the database. You are providing SQL implementations of these queries for YRB DB to do the job.

Each query is given a name for bookkeeping. Also, the answer table for each is provided! You have successfully implemented the SQL query if you get the same result when run in DB2 on YRB DB. (And, of course, you have not cheated by making a simple query on purpose that manages to print the same results!) Otherwise, your query is unsuccessful. Grading per query is all-or-nothing. Either your query produces the right results, or it does not.

The queries below are arranged roughly in order of increasing difficulty. The ones at the beginning are straightforward, and the ones towards the end are difficult. Do not get too discouraged if you are unable to do the final few.

Honesty: For projects, you are permitted to confer with others, seek advice, and (to a reasonable extent) help. However, remember that copying someone else's queries and claiming them as your own work is plagiarism. You must do your own work.

 
  The YRB Database

Two scripts are provided:

The script yrb-create will create the YRB DB schema for you. It will also populate the tables with mock data. The script yrb-drop is provided for convenience. It will drop your copy of YRB DB from your DB2 schema space. If you mess things up, you can always drop YRB DB and then re-create it easily.

To create the YRB DB in your DB2 schema space:

  • % db2 -tf yrb-create

Read the schema definition in yrb-create for YRB DB to understand the design and what the YRB DB is about.

York River Booksellers is an online bookstore. Customers belong to various clubs. Everybody belongs to at least the club basic. Books are available via different offers per club. Thus the price of a book is determined by which offer (thus club) it was bought under.

Consider an order to be all the books a customer bought at the same time (when in purchase). These will be shipped together.

 
  The Queries
1. ancient

Who are the customers who made a purchase before January 1 2002 ('2002-1-1')?

Show customer's name and city and the date (not timestamp!) of purchase. Eliminate duplicates.

Order by name + city + date

Answer table for ancient.

Hint: The attribute when is of type timestamp. This is different than type date. So to use it say in a WHERE condition to compare it with, say, '2002-1-1'. you will need to cast it to type date: cast(when as date). Likewise, '2002-1-1' is a string! So for DB2 to consider it as a date, you must cast it as well: cast('2002-1-1' as date).

2. multiple

Which customers have bought several copies of a book within a purchase?

Show the customer's name, the book's title and year, and how many copies were purchased. Order by name + title + year.

Answer table for multiple.

3. like

Which books have 'like' or 'Like' in the title?

Show the tile, year, and the book's category. Order by title + year + cat.

Answer table for like.

Hint: See the SQL operator like in the textbook (pp 140-141, 3rd ed.).

4. topbuyer

For each club, who is the customer who has spent the most via that club's offers on books, and what is the total that he or she has spent via that club?

Be certain to account for ties. Two or more customers could tie as top buyer for a given club. In such cases, report all of the top buyers.

Order by total sales, from highest to lowest, then by customer name.

Answer table for topbuyer.

5. pairs

Which pairs of customers have bought at least three books in common?

Print three columns: two with the customers' names and one with the number of books in common. Do not return any duplicates. Furthermore, say 'Mark Dogfurry' and 'Zebulon Zilio' have four books in common, only output ('Mark Dogfurry', 'Zebulon Zilio', 4) and not ('Zebulon Zilio', 'Mark Dogfurry', 4)!

If 'Mark Dogfurry' and 'Zebulon Zilio' have each bought the same book three times, this does not count. It has to be at least three different books.

Order by the names. For each tuple, be certain that name first (column one) is alphabetically before name second (column two).

Answer table for pairs.

6. allbooks

Which customers have bought all the books offered within some category / language group, given that category / language group contains more than one book title? E.g., Who has bought all the French romance books?

List by customer's name and the Category and language.

Order by name + category + language.

Answer table for allbooks.

7. orders

All the books a customer orders at the same time (when) are considered to be part of the same "order". Those books are shipped together to the customer and the customer is billed for the entire order.

What is the bill for each order?

Show the customer's name and city, the date and time of the order (not the timestamp!), and the bill. Order by name + city + when.

Answer table for orders.

Hint: Want to shorten one of your decimal column types so your lines do not wrap on output? You can cast it to be smaller. For instance,

cast(col_a as decimal(5,2))

This will print decimals (numbers) with 5 numerals to the left of the "." and 2 to the right.

8. weights

What is the total weight of each order?

Order by weight, descending, and then by customers' names (in cases of ties on weight).

Answer table for weights.

9. billing

As in Query orders, all the books a customer orders at the same time (when) are considered to be part of the same "order". Those books are shipped together to the customer and the customer is billed for the entire order.

What is the bill for each order, with the shipping cost added?

The shipping cost is as follows: The weight of the order is looked up in the 'yrb_shipping' table. If the weight is X grams, the entry just higher than (or equal to) X is found in the shipping table and the associated shipping price is added. For instance, if the order's weight is 1447 grams, the entry '1500 5.00' is found, and so the cost is $5.00.

Show the customer's name and city, the date and time of the order (not the timestamp!), the bill without the shipping charge, and the total bill (with shipping). Order by name + city + when.

Answer table for billing.

Hint: This query is an extension of query orders; it added another column that shows the total price, with shipping. Also note that you can reuse your query weights to help figure out the shipping cost for each purchase.

You may find DB2 SQL's with clause useful for this query. (You probably would find with useful for query percentage too and perhaps others.) It lets you declare "temporary tables" for use within a query. So you can store the results of one query for use in another. See below on this page for an example.

10. droppable

A club is droppable if all the purchases that are in the database could still have been made by the customers, just using the remaining clubs instead.

Note that each "reassigned" club per purchase must be a club for which the purchase's customer is a member. If there are no other legitimate clubs for reassignment for some purchase, the club in question is not droppable.

For each droppable club, how much more money (or less!) would YRB have made if that club had never existed?

Assume that a club "reassignment" for each purchase involving the dropped club replaces it with a best offer (across the remaining, legitimate clubs) for that customer.

Order by club.

Answer table for droppable.

 
  Deliverables

Due by 11:59pm Friday 22 May 2009.

For each query, write your query in a file with the corresponding name as above (ancient, multiple, like, topbuyer, pairs, allbooks, orders, weights, billing, and droppable).

Do not include the answers generated by each of your queries. We will be testing your queries ourselves for grading your project.

When you are finished, use the submit command to turn in your work. We shall publish soon the submit script that you are to use. The command will look something like:

% submit 3421 yrb ancient multiple like topbuyer pairs allbooks orders weights billing droppable

 
  Hints & DB2 Comments

DB2 SQL provides a useful clause called with. It is easiest to show its use by example.

refund

Who has bought a book via some club that he or she could have gotten less expensively using another club to which he or she belongs? List the customer's name, the book's title and year, the quantity bought, the price for which it was bought, and the best price for which it could have been bought.

with
    best (cid, title, year, lowest) as (
        select distinct M.cid, O.title, O.year, min(price)
            from yrb_member M, yrb_purchase P, yrb_offer O
            where M.club = O.club and
                  M.cid = P.cid and
                  P.title = O.title and P.year = O.year
            group by M.cid, O.title, O.year
    )
select C.name, P.title, P.year, qnty, price, lowest
    from yrb_customer C, yrb_purchase P, Best B, yrb_offer O
    where P.cid = B.cid and P.title = B.title and
          P.year = B.year and P.title = O.title and
          P.year = O.year and P.club = O.club and
          C.cid = P.cid and
          O.price > B.lowest
    order by C.name, P.title, P.year;

Answer table for query refund.

One may have more tables in the with clause, if need be:

with
    first (...) as
        (...),
    second (...) as
        (...),
    third (...) as
        (...)
select ...
    ...;

Note that you can use the (temporary) view first inside the definition for second, and the views first and second inside the definition for third, and so forth.

DB2 is picky about nested SELECTS always being named via AS. For example,

select ...
    from (select ...) as A,
         (select ...) as B,
         ...
    ...;

Even if you do not need the alias names A or B later on, give the sub-queries (nested SELECTS) names anyway. Otherwise, DB2 will complain.