|
Advanced Database Systems COSC 6421 York University Fall 2002 |
|
|
In this assignment, you shall implement a simplified version of the buffer pool manager layer with the LRU replacement strategy. It is simplified in that we will not support concurrency control or recovery. |
|
Begin by reviewing Chapter 9 (Storing Data: Disks and Files), for an overview of buffer management. |
|
The buffer pool is a collection of frames (page-sized sequence of main memory bytes) that is managed by the buffer manager. You should assume for this project that there are 10 frames available in the buffer pool. It should be stored as an array bufPool[numbuf] of Page objects. Additionally, you should maintain an array bufDescr[numbuf] of descriptors, one per frame. Each descriptor is a record with the following fields:
The pin_count field is of type integer, page number is of type integer, and dirtybit is of type boolean. These describe the current page that is stored in that frame. A page is identified by a page number. It is unique with respect to all pages in the database. A simple hash table should be used to determine out what frame a given disk page occupies. The hash table should be implemented to reside entirely in main memory, using an array of pointers to lists of <page number, frame number> pairs. This array is called the directory. Each list of pairs is called a bucket. Given a page number, you should apply a hash function to locate the directory entry which points to the bucket that (potentially) contains the frame number for this page. The bucket does contain a <page number, frame number> pair for this page, if the page is indeed in the buffer pool. If, however, on searching the bucket you do not find a pair containing the page number, this means that the page is not in the pool. The hash function should distribute values in the domain of the search field fairly uniformly over the collection of buckets. Say we have HTSIZE number of buckets, numbered 0 through M-1. A hash function h of the form
works well in practice. HTSIZE should be chosen to be a prime number for this. When a page is requested, the buffer manager should do the following.
You can earn one point in extra credit (on top of the ten points total for the project) if you implement both clock and LRU. In this case, show the test results run under both policies. However, only attempt this once you have the LRU policy working already! (This is lots more work, so I do not really recommend the extra effort.) Implementing clock for the project is the easier of the two policies for two reasons: clock seems slightly easier to implement (less code) than LRU, and the original project template code is already set up for clock. LRU Replacement Policy This is the policy you are supposed to implement for this project. To implement the LRU replacement policy, maintain a queue of frame numbers. When a frame is to be chosen for replacement, pick the first frame in this list. A frame number is added to the end of the queue when the pin_count for the frame is decremented to 0. A frame number is removed from the queue when the pin_count becomes non-zero. (This can happen if there is a request for the page currently occupying the frame!) Use a queue data-structure to keep track of the unpinned frames (that is, the frame available for replacement). Whenever a page is pinned, the corresponding frame in the buffer pool should be removed from the availability queue. Whenever a page is unpinned, the corresponding frame in the buffer pool should be added to the head of the availability queue. When a replacement is requested (to make room for a new page in the buffer pool), the frame at the end of the availability queue should be dequeued. By using a queue, there is no need to time-stamp frames for tracking least recently used. You should use a second queue as well to track empty frames in the buffer pool (frames with no page associated). At startup, there will be empty frames. Also, whenever a page is freed, it is destroyed in the database (so dropped from the buffer pool and from the disk). On any new frame request, an empty frame should be returned if avaiable, and an unpinned frame (via LRU selection) only if the is no empty frame. Clock Replacement Policy You can implement this policy additionally for extra credit. To implement the clock replacement policy, maintain an array of State of length of the number of buffer frames (NumBuffers). Then state[i] refers to the "state" of buffer frame i. State can be an enumerated type with three values:
Whenever the pin count of buffer frame i is greater than zero, then state[i] = pinned should be true. When the pin count for the page in frame i goes to zero, then state[i] should be set to referenced. Also maintain a pointer, say head, which points to a position in the state array. (This can just be an integer, an index to the array.) This indicates the next frame that is a candidate for use in the buffer pool to bring in a new page. When a request comes for a new page, under the clock policy we proceed to hunt for a suitable frame as follows.
Note that onew could define a two-state clock policy just using pinned and available. Having this extra step with referenced is worthwhile, though. It means that a page whose pin count recently went to zero will stay in the buffer pool for some time at least. And a page that was used recently has a high probability of being neaded again soon. It is only after the clock arrives at the page's frame for a second is the page dropped from the buffer pool. While this is not as discerning as LRU, it offers some of the same advantages in this way. Caution: Be certain to track the number of frames clock checks in looking for a frame free for use. If clock has checked 2*NumBuffers, this means that there were no available frames, even on the second pass through the buffer pool! So there will never be, no matter how much more you look. If you do not check for this, your code enters into an infinite loop now. Instead, if this happens, the code should issue am error stating that there are no free frames in the buffer pool. |
| You will receive several files with test data. Each line in the file will contain a triple of integers a b c, where a is the transaction number, b is the number of the page requested by the transaction and c is the flag indicating whether the transaction will write the page (the value of the flag is 1) or not (the value of the flag is 0). Some of the lines will contain just a single integer a. This is to indicate that transaction a is finished. |
|
Be certain to return an error message when something unexpected happens (you should figure out what it can be). |
|
You should print out in a readeable form the contents of the buffer pool (that is, for each frame include page number it contains, pin_count, and dirtybit) after the test data is processed. If for some reason the processing failed, you should simply return an appropriate error message. Also, provide a printout with your code. Be sure your code is thoroughly tested as you will receive your test data only an hour before the project is due! |