CSE3221 (Winter 2008) Assignment Two

 

You have to work individually. You are not allowed to view or exchange documents with your peers. We treat all sorts of cheating very seriously. Do not copy code from anywhere, even as a `template''. No late submission will be accepted. You are required to program in your PRISM account. Your work will be graded based on: i) correctness of programming logic; ii) clarity of your code and your coding style; iii) whether your code follows the specification. It is your responsibility to explain clearly every detail you do in the code with appropriate comments to avoid any possible confusion in marking.

 

Due date: 12:00noon, Mar 21st 25th (FridayTuesday).

 

Your task is to write a C program that creates three groups of Pthreads, an IN group, a CONVERT group and an OUT group, to convert all lower-case alphabetic letters [a-z] in an input text file to upper-case letters [A-Z] and output a copy of the input file which has all lower-case letter converted to the corresponding lower-case letters.

 

The original main thread is not part of these three groups. The main() function should open the source file, and create/initialize a circular buffer, and create all IN, CONVERT and OUT threads. Then, the main thread waits for all these threads to finish. All IN, CONVERT and OUT threads share the circular buffer. Each buffer slot stores 2 pieces of information: one data byte read from the source file and its offset in the source file. 

 

typedef  struct {

     char  data ;

     off_t offset ;

} BufferItem ;

 

Each IN thread goes to sleep (use nanosleep) for some random time between 0 and 0.01 seconds upon being created. Then, it reads the next single byte from the input file and saves that byte and its offset in the file to the next available empty slot in the circular buffer. Then, this IN threads goes to sleep (use nanosleep) for some random time between 0 and 0.01 seconds and then goes back to read the next byte of the file until the end of file. If the circular buffer is full, IN threads go to sleep (use nanosleep) for some random time between 0 and 0.01 seconds and then go back to check again.

 

Meanwhile, upon being created each CONVERT thread sleeps (use nanosleep) for some random time between 0 and 0.01 seconds and it reads next byte in the circular buffer and converts it into the corresponding upper-case letter if this byte is a lower-case English letter. If the byte is NOT a lower-case alphabetic letter, this CONVERT thread leaves the byte unchanged. Then the CONVERT thread goes to sleep (use nanosleep) for some random time between 0 and 0.01 seconds and goes back to convert next byte in the buffer until the entire file is done. If the circular buffer is empty, CONVERT threads go to sleep (use nanosleep) for some random time between 0 and 0.01 seconds and then go back to check again.

 

Similarly, upon being created, each OUT thread sleeps (use nanosleep) for some random time between 0 and 0.01 seconds and it reads a converted byte and its offset from the next available nonempty buffer slot, and then writes the byte to that offset in the target file. Then, it also goes to sleep (use nanosleep) for some random time between 0 and 0.01 seconds and goes back to copy next byte until nothing is left. If the circular buffer is empty, the OUT threads go to sleep (use nanosleep) for some random time between 0 and 0.01 seconds and then go back to check again.

 

Along the way, each thread writes some information to three log files, so we can better trace the execution of your program.

 

Since all threads access common data, synchronization will be required.  You may wish to look at the man pages for pthread_create, pthread_mutex_init, sem_init and other related pthread API’s. Use critical sections of code.  You should make your critical sections as small as possible.  For example, IN threads should not have one big critical section where they do all of the following together: (a) read from the file; (b) write to the buffer; (c) write to the log file.  Instead, they should have 3 critical sections, one for each of (a) – (c). Similarly, CONVERT and OUT threads should have separate critical sections for reading the buffer, writing the copy and writing the log file.

 

The program should be called convert.c and will be compiled with:

 

cc -Wall -o cnvt convert.c -lpthread

 

It will be invoked as follows:

 

 cnvt <nIN> <nCONVERT> <nOUT> <file_in> <file_out> <bufSize> <IN_Log> <CONVERT_Log> <OUT_Log>

 

 <nIN> is the number of IN threads to create. There should be at least 1.

<nCONVERT> is the number of CONVERT threads to create. There should be at least 1.

 <nOUT> is the number of OUT threads to create. There should be at least 1.

 <file_in> is the pathname of the file to be converted. It should exist and be readable.

 <file_out> is the name to be given to the target file. If a file with that name already exists, it should be overwritten.

 <bufSize> is the capacity, in terms of BufferItem’s, of the  shared circular buffer. This should be at least 1.

<IN_Log> the IN threads write some trace information to this file. If a file with that name already exists, it should be overwritten.

<CONVERT_Log> the CONVERT threads write some trace information to this file. If a file with that name already exists, it should be overwritten.

<OUT_Log> the OUT threads write some trace information to this file. If a file with that name already exists, it should be overwritten.

 

 

The log files

=============

 

The log files have no part in the file conversion, but let us better trace the execution of your program.

 

Each of the IN threads should be given a different number in the range 0 … <nIN>-1.  Each of CONVERT threads should be given a different number in the range 0 … <nCONVERT>-1. Each of the OUT threads should be given a different number in the range 0 … <nOUT>-1. Each thread should know its own number. (This number is different from the thread id.)

 

When an IN thread reads the next unread byte from the file, it can obtain the offset using the lseek system call. When the IN thread saves the byte and its offset to the buffer, it writes to a particular index in the buffer.  Each time an IN thread number n reads a byte from offset x in the file and writes it to index i in the buffer, it should write the line n x i to the <IN_Log> file.  More exactly, it should write its thread number, followed by a single blank, followed by the offset in the file, followed by a single blank, followed by the index in the buffer, followed by a newline character '\n'.

 

In a similar way, each CONVERT thread number n converting a byte from file offset x from the buffer slot indexed as i should write n x i to the <CONVERT_Log> file. More exactly, it should write its thread number, followed by a single blank, followed by the byte offset in the file, followed by a single blank, followed by the index in the buffer, followed by a newline character '\n'.

 

Similarly, each OUT thread also writes n x i to the <OUT_Log> file. More exactly, it writes its thread number, followed by a single blank, followed by the offset in the file where it writes its byte, followed by a single blank, followed by the index in the buffer where it read its byte, followed by a newline character '\n'.

 

What to submit?

 

Submit the C program using the following command from your PRISM account:

 

submit 3221 a2 convert.c

 

Include the following information (please complete) as a comment at the beginning of your C program:

/*

Family Name:

Given Name:

Section:

Student Number:

CS Login:

*/

 

No hardcopy is needed for this assignment.