Infrastructure:
Databases & State
Parke Godfrey
14 September 2012
CSE-2041
Parke Godfrey
14 September 2012
CSE-2041
These slides borrow from the following sources.
state
The term state in CS and Math
refers to when a system can react to the same input differently
at different times.
The system can be in different states
at different times.
stateless
We say something is stateless
if it always provides the same output for the same given input.
stateful
For a system to have “state”
—
to be stateful
—
means that it must store information / data
persistently over time.
Well... it does not make sense to ask if a program is stateful, but rather if a process — an instance of the program running — is stateful.
Is a process — an invocation of a program — stateful?
During the lifetime of the process, most often, yes.
“State” is represented by the program's variables, and the values for these in the process.
Is a program stateful across invocations?
Most often, no.
Are operating systems stateful across invocations?
Yes, they are
The “state” is the file system.
HTTP in itself is stateless.
Then again, what does it mean for a protocol to be called stateless or stateful?
The initial concept of WWW was for HTTP exchange to be stateless.
Why have HTTP be stateless?
The term “Web 1.0” is often used to describe the static Web.
Think HTTP/1.0
.
For Web 1.0, the Web was just a delivery service.
The client asks for a “page” by URL; The server returns that page (if it has it).
In the '90's, this is mostly what the Web was.
The term “Web 2.0” is often used to describe the dynamic Web.
Think HTTP/1.1
.
For Web 2.0, the Web is interactive.
The client can interact with a Web “page” (a webapp) as a service.
In the '00's, this is what the Web evolved into.
For the interactive Web (“Web 2.0”), we need to support the notion of session.
This session is different from a HTTP “session” (exchange)!
Must maintain state across subsequent HTTP exchanges.
How to define the scope of a session may be hard.
E.g., a shopping session at Amazon.
This topic has a name: e-commerce.
In some cases, we are interested in inter-session state too.
Often, this is done with accounts and log-ins. E.g., Facebook.
Directed advertising has been the big driver of this.
How to do it?
What to store?
Where to store?
We will learn there are a number of mechanisms, tools, and methods for
keeping information (state), and
managing sessions.
Engineering goals:
For practicality, we may need to keep things as “stateless” as possible.
Use a Map Collection:
Map<String, Student> sis = new TreeMap<String, Student>();
Student s = new Student(“123456789”, …);
sis.put(“123456789”, s);
…
String id = input.nextLine();
if (sis.containsKey(id))
{
output.println(sis.get(id).getGpa());
}
What if more information than main memory?
What if more than one server?
How to handle many concurrent requests efficiently.
How to protect against inconsistent actions?
E.g., Sell the remaining concert ticket to two different people.
How to protect against inconsistencies from arising in our “data base”?
E.g., Customer has two primary shipping addresses.
What if the server crashes?
How to protect against losing data?
These very data storage, maintenance, and retrieval issues exist for lots of large applications.
Database systems have been around since the 1960's.
Relational database management systems (RDBMS's) since the 1980's.
E.g., Oracle, IBM DB2, Microsoft Access, Microsoft SQL Server, Sybase, MySQL, Derby, ...
Key idea is data maintenance as a service.
abstraction:
Remove details relating to data storage and access from apps and put them in one place, the DBMS.
interface:
Provide an interface to the DBMS so apps can use it as a service. The interface (SQL) is independent of how the data is organized in the database.
encapsulation:
This reduces the complexity of all apps, removes redundancy, and allows us to enforce rules centrally rather than per application.
A transaction is a change to, and / or a retrieval from, a database.
Atomicity
Consistency
Isolation
Durability
All the (primitive) actions of a transaction should occur and commit, or none should.
transfer(account X, account Y, amount M) {
X.balance =- M;
Y.balance =+ M;
}
What if X.balance < M
?
Integrity rules can be added to a database.
E.g.,
No balance can be negative.
No two students can have the same ID.
A student cannot register for a course without having passed the prerequisites for the course.
This means the database system must deny (rollback) any transaction that would result in inconsistent data in the database.
It must seem like transactions happen one after another, never mixed.
Behind the scenes, they could be processed concurrently.
transfer(parke, john, 50)
[X-act A]
&
transfer(parke, samantha, 75)
[X-act B].
The following would result in a mistake.
Once a transaction updating the database commits, that new data (the update) is
visible to all, and
permanent (until, perhaps, another transaction changes it later).
How to “talk” to a database server to
define: create, alter, & drop databases (and their components).
manipulate: insert, update, & delete data.
query: query a database to retrieve information.
Such a “protocol” is complex, almost like a programming language! It is called a “query” language.
SQL is a standardized query language for relational database systems.
Can only “talk” to the database via SQL!
A database is a collection of related tables.
A table is a set of named columns (the schema) and a number of rows (tuples) of information in that schema.
Few columns: one to a couple of hundred.
Many rows: perhaps billions!
A cell is a given column of a given row in the table.
A cell's value must be a simple piece of data: e.g., an integer, a string.
One-Tier
All DBMS processing is done at the clients, data resides in a file server on the LAN. E.g., Microsoft Access.
Two-Tier
DBMS processing and data on a database server that is accessed by clients via ODBC or JDBC.
Three-Tier
DBMS tier, web tier, and a client tier. Clients connect to a webapp, which accesses the DBMS.