Advanced Database Systems
COSC 6421
York University
Fall 2002

Due date: October 17 in class


Introduction

In this assignment, you shall implement a simplified version of the buffer pool manager layer with the LRU replacement strategy. It is simplified in that we will not support concurrency control or recovery.


Available Documentation

Begin by reviewing Chapter 9 (Storing Data: Disks and Files), for an overview of buffer management.


Recommended Internal Design

The buffer pool is a collection of frames (page-sized sequence of main memory bytes) that is managed by the buffer manager. You should assume for this project that there are 10 frames available in the buffer pool. It should be stored as an array bufPool[numbuf] of Page objects.

Additionally, you should maintain an array bufDescr[numbuf] of descriptors, one per frame. Each descriptor is a record with the following fields:

  • page number,
  • pin_count, and
  • dirtybit.

The pin_count field is of type integer, page number is of type integer, and dirtybit is of type boolean. These describe the current page that is stored in that frame. A page is identified by a page number. It is unique with respect to all pages in the database.

A simple hash table should be used to determine out what frame a given disk page occupies. The hash table should be implemented to reside entirely in main memory, using an array of pointers to lists of <page number, frame number> pairs. This array is called the directory. Each list of pairs is called a bucket. Given a page number, you should apply a hash function to locate the directory entry which points to the bucket that (potentially) contains the frame number for this page. The bucket does contain a <page number, frame number> pair for this page, if the page is indeed in the buffer pool. If, however, on searching the bucket you do not find a pair containing the page number, this means that the page is not in the pool.

The hash function should distribute values in the domain of the search field fairly uniformly over the collection of buckets. Say we have HTSIZE number of buckets, numbered 0 through M-1. A hash function h of the form

h(value) = (a*value+b) mod HTSIZE

works well in practice. HTSIZE should be chosen to be a prime number for this.

When a page is requested, the buffer manager should do the following.

  1. Check the buffer pool (using the hash table) to see whether it contains the requested page already.

    If the page is not in the pool, it should be brought in as follows.

    • Choose a frame for replacement, using your replacement policy (LRU).
    • If the frame chosen for replacement is dirty, flush it (that is, write out the page that the frame contains to disk).
    • Read the requested page (again, by calling the DB class) into the frame chosen for replacement. The pin_count and dirtybit for the frame should be initialized to 0 and FALSE, respectively.
    • Delete the entry for the old page from the buffer manager's hash table, and insert an entry for the new page. Also, update the entry for this frame in the bufDescr array to reflect these changes.

  2. Pin the requested page by incrementing the pin_count in the descriptor for this frame.

You can earn one point in extra credit (on top of the ten points total for the project) if you implement both clock and LRU. In this case, show the test results run under both policies. However, only attempt this once you have the LRU policy working already! (This is lots more work, so I do not really recommend the extra effort.) Implementing clock for the project is the easier of the two policies for two reasons: clock seems slightly easier to implement (less code) than LRU, and the original project template code is already set up for clock.


LRU Replacement Policy

This is the policy you are supposed to implement for this project.

To implement the LRU replacement policy, maintain a queue of frame numbers. When a frame is to be chosen for replacement, pick the first frame in this list. A frame number is added to the end of the queue when the pin_count for the frame is decremented to 0. A frame number is removed from the queue when the pin_count becomes non-zero. (This can happen if there is a request for the page currently occupying the frame!)

Use a queue data-structure to keep track of the unpinned frames (that is, the frame available for replacement). Whenever a page is pinned, the corresponding frame in the buffer pool should be removed from the availability queue. Whenever a page is unpinned, the corresponding frame in the buffer pool should be added to the head of the availability queue. When a replacement is requested (to make room for a new page in the buffer pool), the frame at the end of the availability queue should be dequeued. By using a queue, there is no need to time-stamp frames for tracking least recently used.

You should use a second queue as well to track empty frames in the buffer pool (frames with no page associated). At startup, there will be empty frames. Also, whenever a page is freed, it is destroyed in the database (so dropped from the buffer pool and from the disk). On any new frame request, an empty frame should be returned if avaiable, and an unpinned frame (via LRU selection) only if the is no empty frame.


Clock Replacement Policy

You can implement this policy additionally for extra credit.

To implement the clock replacement policy, maintain an array of State of length of the number of buffer frames (NumBuffers). Then state[i] refers to the "state" of buffer frame i. State can be an enumerated type with three values:

  • referenced,
  • available, and
  • pinned.

Whenever the pin count of buffer frame i is greater than zero, then state[i] = pinned should be true. When the pin count for the page in frame i goes to zero, then state[i] should be set to referenced.

Also maintain a pointer, say head, which points to a position in the state array. (This can just be an integer, an index to the array.) This indicates the next frame that is a candidate for use in the buffer pool to bring in a new page. When a request comes for a new page, under the clock policy we proceed to hunt for a suitable frame as follows.

  1. Say head = i. Check frame i. If its state is pinned, we cannot use this frame. There is a page there that cannot be removed. Increment head by one. (Actually, head = head + 1 mod NumBuffers, as we treat the array of buffer frames as circular. Hence the name "clock" for this strategy.) Try again.
  2. If state[i] = referenced, this means that frame i is free. (The page it contains has pin count zero.) However, we do not use it! Instead we set state[i] to available, increment head, and try again.
  3. If state[i] = available, Then we can use this frame. If it holds a page and the page's dirty bit is set to TRUE, flush the page first. (We know the page's pin count is zero.) It is also possible that the frame is currently empty and holds no page. (This only happens early on after the database system has been started and this buffer frame has not been used until now.) In the code, the variable pageNo tells which page resides in the frame buffer. If pageNo = INVALID_PAGE, this means that there is no page there.

Note that onew could define a two-state clock policy just using pinned and available. Having this extra step with referenced is worthwhile, though. It means that a page whose pin count recently went to zero will stay in the buffer pool for some time at least. And a page that was used recently has a high probability of being neaded again soon. It is only after the clock arrives at the page's frame for a second is the page dropped from the buffer pool. While this is not as discerning as LRU, it offers some of the same advantages in this way.

Caution: Be certain to track the number of frames clock checks in looking for a frame free for use. If clock has checked 2*NumBuffers, this means that there were no available frames, even on the second pass through the buffer pool! So there will never be, no matter how much more you look. If you do not check for this, your code enters into an infinite loop now. Instead, if this happens, the code should issue am error stating that there are no free frames in the buffer pool.


Compiling Your Code and Running the Tests

You will receive several files with test data. Each line in the file will contain a triple of integers a b c, where a is the transaction number, b is the number of the page requested by the transaction and c is the flag indicating whether the transaction will write the page (the value of the flag is 1) or not (the value of the flag is 0). Some of the lines will contain just a single integer a. This is to indicate that transaction a is finished.


Error Protocol

Be certain to return an error message when something unexpected happens (you should figure out what it can be).


Deliverables

You should print out in a readeable form the contents of the buffer pool (that is, for each frame include page number it contains, pin_count, and dirtybit) after the test data is processed. If for some reason the processing failed, you should simply return an appropriate error message. Also, provide a printout with your code. Be sure your code is thoroughly tested as you will receive your test data only an hour before the project is due!