FAQ for Project #1: BufMgr (Java Minibase) | CSE-4411M | Winter 2013

CSE-4411M
Database Management Systems
York University
Winter 2013

Project #1
Buffer Pool Manager
Frequently Asked Questions (FAQ)

General

Do we implement anything other than the methods listed in the assignment page? Are we permitted to create more classes, such as bufferHashTable class, to support the bufMgr class?

Yes, you will also be building supporting classes within the bufmgr package, like FrameDesc, BufHTEntry, BufHashTbl, and Replacer (implement the LRU strategy). It is up to you what you want to call these and how to organize them.

Indeed, it would be quite awkward to pull this off with only sub-classes within bufMgr, so introducing other classes when appropriate is fine.

The exceptions you are to handle will also require you to have small auxiliary classes (and files to hold them).

What is flushAllPages supposed to do?

The method flushAllPages is called on clean-up by Minibase when things are shut down to flush all dirty pages to disk.

Compiling, Make, The Code Base, etc.

If I want to say import global *, import diskmgr.* in my code, do I need to download these packages somehow?

No. You have effectively "downloaded" these when you copied the directory over. There is a JAR file at

/cs/dept/course/2012-13/W/4411/bufmgr/lib/bufmgrAssign.jar

It contains all the components of Java Minibase (the packages global and diskmgr) that your part, bufmgr, needs to compile.

Add this onto your CLASSPATH, or use the Makefile to do your compilation.

So when I follow the instructions and do, for example,

% cp -rp /cs/dept/course/2012-13/W/4411/bufmgr/src proj

what all am I copying over?

You are copying a directory with three sub-directories:

bufmgr/
diskmgr/
tests/

As you have copied it, these are within a new directory of yours now called proj/.

So far, there is nothing in bufmgr/ but a Makefile. This is the sub-directory where you will be putting the code you write to implement the bufmgr. This is the only sub-directory you need to change really.

The sub-directory diskmgr/ is just there for your convenience. It is the source code for the disk space manager. You do not really need this for your project; seeing the API for the diskmgr/ in the JAVADOC is enough. However, it may be useful to see how things are done within the diskmgr/.

Caution: The source code for the diskmgr/ that you have copied is not what is compiled with you code! The code in the JAR for diskmgr/ (which is on your CLASSPATH) is what is being used. The source for diskmgr/ you have copied is just for reference.

There is a Makefile in diskmgr/ for compiling that code, but you are not using it.

The sub-directory tests/ is important. It is wrapper code (with a main that runs a number of tests on the code (that you are writing). So you do not have to design and build your own testing code. Use what is here. You are welcome to modify this, add new tests, etc. When we test your final submitted code, we will be using the original version of this tester.

When I attempt to compile, the compiler complains it cannot find the package bufmgr. What is wrong?

As your buffer manager code is in its own directory (bufmgr/), it is expected that the code is encapsulated in a java package bufmgr. Be certain to declare that at the beginning of your BufMgr.java file.

package bufmgr;

What files do I need to have in my bufmgr/ directory?

Your main code for the buffer pool will be in a file BufMgr.java. Here is a template file to get you started: BufMgr.java. This will house the class BufMgr. You may have auxiliary classes in the same file. Of course, you are allowed to put these in other files in the same directory, if you prefer.

You will need a file per exception that you need to declare too. This will make for a number of small files.

Remember, all files will be of the package bufmgr/, and you need to declare that at the top of each, as discussed above.

What is the deal with the Makefile? How do I compile my code?

To compile your code, javac needs to see packages global and diskmgr (the parts of the existing Java Minibase code base) that your code employs.

There is a Makefile both in bufmgr/ and tests/ for the UNIX make utility meant to make life easy for compiling the code. Do not know anything about make?! Read the man page on make. It is a useful utility for managing (compiling, linking, etc.) code spread over multiple files (classes). For example, when in the directory bufmgr/, you say

% make

it compiles all the java files whose classes either do not exist yet or whose source (.java) is newer than the object code (.class). So make is efficient in that it only performs that work that is needed.

(Note that the '%' above represents the command-line prompt.)

The make utility reads a file called Makefile in the current directory (if there is one) to find its instructions. The Makefile is just a short ASCII file and you can easily look it over.

Of course, you can do your "compilation" by hand, if you prefer. Most importantly, you will have to set up your CLASSPATH correctly for this. (When doing it via make, the Makefile sets this up.) Look inside the Makefile to see what you need. Namely, you will need something like

% setenv CLASSPATH ../:/cs/dept/course/2012-13/W/4411/bufmgr/lib/bufmgrAssign.jar:

You need to have that JAR on your path. The ../ is important for tests; it puts your bufmgr package on the path.

Now to run, since BMTest is in a package named tests, say

% java tests.BMTest

Java (Hisss!)

May we use things from the java utility package? Such as to do a Hashtable?

Certainly. (Note that a map might be more appropriate.)

Output, Testing, & Grading

May we have a working copy of, say, BufMgr.class so we can see the output of a working version and how it performs? :-)

I know that could be useful. However, no. Sorry. (Java is too easy to reverse compile, for one thing.)

I realize that it is difficult when starting the project. Nothing works yet. Build things step by step and modularly. Think carefully what each method needs to do. It is doable.

How should our output look then when we run bmtest once things are all working?

Here is the output as generated by a working bufmgr package: sample output file.

Note that it says, "Failed as expected," and things like that in places. It is reporting that the BP manager "failed" in the proper way. This is not saying there is a problem. So interpret carefully what the output is saying before panicking.

You can read the BMTest.class file to see what tests are being performed on the BP manager. Indeed, you can comment out some of the tests while you are working. Just be sure to uncomment them by the end, and know your code passes all the tests.

The TestDriver.java file is just a wrapper class the Minibase folks developed that calls and executes the tests, in this case, in BMTest.class.

When grading, will you be testing our BP manager with additional tests not in bmtest?

No. So if your manager is passing bmtest fine, you're in great shape.

That said, I do look at your code when grading. So things need to be implemented in the right way. (For instance, writing a dummy package that simply prints out right looking results would not pass!) However, I will not go looking for bugs in your BP manager beyond what bmtest tests for.

Buffer Pool Specifics

What does the emptyPage mean in the pinPage() method?

Boy you people are nosey! :-) The assignment specifies, "You can assume that emptyPage==false for this assignment."

Anyway, it is used to indicate that we know the pageID has just been created, and we are now pinning the (brandnew) page in the buffer-pool. Because the page is new, there is nothing written on it yet. So the bufmgr does not have to ask the diskmgr to read it from disk. This saves an I/O. Anything to save an I/O!

In the flushPage() method, if the page is pinned, will it still be flushed?

Yes. However, throw a, say, PAGE_PINNED exception when you are done.

In the flushPage() method, if the page is not in buffer pool, do we need throw a exception? Since the client of buffer manager doesn't know whether the page is present or not, this would seem odd.

Indeed. Just do nothing if called to flush a page not in the buffer pool.

Why bufpool[numbuf][page_size]?

This is the buffer pool itself. This is the space in main memory controlled by the buffer pool manager to store (main-memory copies of) the database pages from disk.

The array bufpool[numbuf] is the array of "frames". The buffer pool owns numbuf number of frames in this case. Each entry here is a reference to an (implicit) array of page_size number of bytes. Each such chunk of memory, a contiguous array of page_size number of bytes, is precisely the size of memory needed to store a page.

You probably recall that, in Java, an array is always an array of references. Objects to be placed in the array must be allocated (new) independently of the array allocation.

So, each array of bytes, that is really an array of references to bytes? No. In the case of the primitive datatype byte (which is shorter than a pointer / reference), Java allocates it directly as an array of bytes, an exception to the general array-allocation rule of Java. So each array of bytes really is a contiguous chunk of memory!

I thought you complained that, in Java, we had to fake the buffer pool, that we were not doing real memory management. It seems this trick with the 2d array lets us do it!

Yes and no. Ideally, we want the entire buffer pool to be one entire, contiguous span of main memory. That is because in a real database system with a real diskspace manager, the disk controller card can write into main memory directly, without the CPU being involved. It is passed an instruction, like, write the 20 blocks off disk starting at block address x002f98 and place them in main memory starting at address xd80294. This only works if our frames are contiguous.

We are not guaranteed this here. Since bufpool[numbuf] is an array of references, the arrays of bytes (our frames) can be scattered throughout the heap. Java does not allow us closer control over memory allocation.

However, our arrays of bytes (our frames) are each contiguous, at least. (Otherwise, we really couldn't do anything.)

Why can I not be a good object-oriented citizen and store my frame descriptor information with the "frame" in bufpool[numbuf]?

Again, ideally our frames for page placement would be contiguous in memory. If we made a frame object with the descriptor information and the page-space together, we are guarranteeing that this is not the case.

So our engineering requirements overrule our object-oriented sense in this case.

What is meant by "make the buffer manager aware that the replacement policy is specified by replacerArg" in the constructor?

A string may be passed in when the BufMgr constructor is called that is to indicate the replacement strategy to be used. You don't need to do anything with this. Under different circumstances, this string might be used to pick one of several implemented strategies. You are just implementing LRU, though, so just ignore it.

In the methods newPage and flushPage, there are several references to the DB object. How do I access this object?

Minibase declares a diskspace manager (DB), a buffer pool manager, etc., when "started" up. (Look into the tests code and the diskmgr code to see examples.) Look in the JavaDoc to see specifics on the SystemDefs package. For example, you can access the DB object as in

SystemDefs.JavabaseDB.read_page(pageno, page);

Does bufmgr.freePage really free a page regardless of whether it is pinned, or does it issue an error if the page is pinned and not free it?

As you know, bufmgr.freePage is meant to free a page. This means it issues a DB call to deallocate the page from the database. Why does bufmgr have a role in freeing a page? Well, code calls from higher layers in minibase call bufmgr to do this rather than calling DB (the disk space manager) directly because the buffer manager needs a shot at adjusting the buffer pool when a page is freed from the database. (Essentially in the layered architecture, a layer only gets to see the layer immediately below it, and not further below.)

Now back to the original question: Should bufmgr.freePage free a page regardless or not?

It (bufmgr.freePage) frees a page if the page is unpinned or if it is pinned once (pin_count == 1). If it is pinned multiply (pin_count > 1), it does not free the page but returns a Minibase error code.

So for myself, I could see justifying either approach.

The bufmgr refuses to "free" pinned pages; they must be unpinned first.
Or the bufmgr happily frees singly pinned pages, and callers shouldn't be doing this unless they know what they are doing.

As is, it is the latter (2) that is the case for us.

Now for the other part of that question. Does it issue an error if the page is pinned?

Good question.

If the pin_count of the page is zero, of course not. There is no problem in this case.
If the pin_count is one, no. What!? Well, we assume that the caller process that pinned the page is the one asking for it to be dropped from the database altogether. Since there is no other process relying on that page (the pin_count is only one), no other transaction will be adversely affected by this.
If the pin_count is greater than one, yes. In this case, more than one process / transaction has asked for that page to be in the buffer pool. One of the transactions (or some unrelated transaction accidently) is asking for it to be dropped from the database. This can cause problems for the other transactions that are relying on this page.

I personally find this policy a bit lame. For instance, we never check in the case that pin_count == 1 that it is the same transaction that pinned the page that is now asking for it to be freed. So some other transaction could be accidently trying to free the page. Anyway, this is design choice made in minibase. And the tests for the project rely on this assumption. So...implement it this way.

What steps should be done in bufmgr.freePage? The following.

Call DB to deallocate the page from the database. This should be done if the page is currently not in the buffer pool, or if it is and is just singly pinned or unpinned. This is because calls from higher layers call bufmgr to free (deallocate) pages, regardless.
If the pin_count of the page was greater than one, issue an error, don't kill the page, and return.
Call "replacer.free" (this depends on how you've coded things), given the page was in the buffer pool, so replacer can make the requisite adjustments. (This is only pertinent if you are implementing a separate replacer class.)
Check using the hash whether the page is in the buffer pool. If so, free the frame that holds the buffer copy of the page. Note that it does not matter whether the page in the buffer pool is marked as dirty, since we are destroying the page altogether.

For the getNumUnpinnedBuffers method, what is the definition of an unpinned BUFFER? I am currently assuming it means that it is either a frame that is free (that is, has no page in it) or that the page in the frame has pin_count == 0. Is that correct?

Yes. Free frames have their pincounts set to zero also and should be counted here.

Why does bufmgr.newPage have a second parameter howmany?

This is in case the X-act (an outside caller to the buffer-pool manager) wants to request that more than one new disk page be created. (The default is that howmany = 1.)

So what is BP (the buffer-pool manager) supposed to do? BP will first call the method allocate_page of DB (the disk-space manager) in order to have a new disk page (or disk pages) made. Method allocate_page takes one or two arguments. The first argument is a PageID object, start_page_num. The second, optional argument is an int, runsize. (This second argument defaults to one if missing.)

PageID is a "holder" object. It contains one field, pid, of type int. The pid is for the ID of a disk page. When calling allocate_page, BP does not know the ID of the page to be created! The DB will set start_page_num's pid to the appropriate value when it has allocated a new page. BP must coin the object start_page_num before calling allocate_page, but BP does not set start_page_num's pid. Here start_page_num is a holder object we are using to pass back the page ID.

If the X-act has howmany > 1, BP should call allocate_page with runsize = howmany. DB, if it can, will allocate a physically contiguous sequence of howmany# of pages on disk. (If it cannot, it will throw an exception, which the BP should catch and chain along.) The page ID of the first page will be conveyed back via start_page_num's pid. What are the page ID's of the subsequent pages? They are pid + 1, pid + 2, etc.

The BP's second job is then to pin the first page of the newly created pages. The BP does not pin any of the subsequent pages. (The X-act might pin them later, but this is not the BP's business.)

If the BP cannot pin this first page (say the BP is full and everything is pinned), it should clean up before failing gracefully (throwing the appropriate exception). This means it should call DB to deallocate the pages just allocated before the BP itself exits.

The Replacement Strategy

Are we supposed to implement three replacement strategies, Clock, LRU, and MRU?

No. Just implement LRU. The strategies (e.g., Clock and LRU) are just as described in the textbook. So you already should know what you need to know about them.


Parke Godfrey