Database

Database is a set of named relations (or tables).

Database Management System (DBMS ) provides efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.
DBMS may be programmed with frameworks. Ruby on Rails, Django etc. Often ran with middleware, file server applications, etc.

The amount of data needing to be stored everyday is growing faster then our computer RAM can store.

Massive - Terabytes.
Persistent - Will outlive the program that made it.
Safe - High integrety. Resistent of hardware, software, and power failure.
Multi-user - Concurrency control (prevent 2 people from withdrawling the same money from the bank).
Reliability - 99.99999% uptime.
Efficiency - Most important. Performance, performance, performance.
Convenient - Physical data indendent. How an application accesses data should remain the same, even if the DBMS is changed. Use high-level abstract languages.

Key People:

DBMS Implementer
Database Designer
Database Application Developer
Database Admin (DBA)

What IS a Database?

CRUD

In information theory, data can be created, data can be read, data can be updated, and data can be deleted.

Create - Read - Update - Delete

C - R - U - D

ACID

Atomicity requires that each transaction be "all or nothing": if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged.
An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.

Consistency property ensures that any transaction will bring the database from one valid state to another.

Isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially.
Providing isolation is the main goal of concurrency control. Depending on the concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction.

Durability property ensures that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.

Atomicity - Consistency - Isolation - Durability

A - C - I - D

Database Design Concepts:

Data-Type

Pretty much what kind of data a piece of information. Called the type name.
It is important to understand that relational theory does not dictate what types are to be supported.

Nowadays provisions are expected to be available for user-defined data-types in addition to the built-in ones provided by the system.

Data Definition Language (DDL)

Commands/Syntax used to setup the database schema.
Application use this to interact with the DBMS and executes queries.

Data Manipulation Language (DML)

Programming syntax for using data. High level (abstract) language. Query Language.

Data Model

How the data is structered.

The Relational Model

Relational Model (RM) for database management is an approach to managing data using a structure (Data Model) and language.

It is used by all major commerical database systems.
It is very simple.
Efficent Implementations.

Relation = Table

	Column 0	Column 1	Column 2
Row 0

A table is an accepted visual representation of a relation.

Attribute = Column

	Attribute 0	Attribute 1	Attribute 2
-

Each relation (table) has a set of named attributes (columns).
An attribute name might be "name" or "age".

Each attribute (column) has a type (or domain). Like an image, floating point number, integer, or a string.

Tuple = Row

	Attribute 0	Attribute 1	Attribute 2
Tuple 0	Value	Value	Value
Tuple 1	Value	Value	Value

Each tuple (or rows) has a value for each attribute (column).

A tuple is basically the same thing as a row, except in an SQL DBMS, where the column values in a row are ordered.
(Tuples are NOT ordered. Instead, each attribute value is identified solely by the attribute name and NEVER by its ordinal position within the tuple).

Key

Key	Attribute
Key 0	Value
Key 1	Value

Each relation/tuple can have a unqiue Key, an attribute (column) whose value is unique in each tuple (row). Or set of attributes whose combined values are unique.

NULL

A special value data type for "unknown" or "undefined".
NULL is part of The Relational Model.

When querying a database containing peoples name and age to return tuples (rows) with age (attribute):

lesser then OR greater then OR equal to 21

The tuples (rows) with NULL age (attribute) will NOT be returned.

Relational Database Management System
(RDBMS)

A relational database management system (RDBMS) is a program that lets you create, update, and administer a relational database.

Most relational database management systems use SQL to access the database.

There are more similarities than differences between the different RDBMS, but the SQL syntax may be slightly different depending on which RDBMS you are using.

Types of RDBMS

SQLite

SQLite is a popular open-source SQL database.
It is able to store an entire database in a single file.
One of the biggest advantages this provides is that all of the data can be stored locally without having to connect your database to a server.
SQLite is a popular choice for databases in cellphones, PDAs, MP3 players, set-top boxes, and other electronic gadgets.

MySQL

MySQL is the most popular open-source SQL database.
It is typically used for web application development, and often accessed using PHP.
The main advantages of MySQL are that it is easy to use, inexpensive, reliable (has been around since 1995) and has a large community of developers who can help answer questions.
Some of the disadvantages are that it has been known to suffer from poor performance when scaling, open source development has lagged since Oracle has taken control of MySQL.
and it does not include some advanced features that developers may be used to.

PostgreSQL

PostgreSQL is an open-source SQL database that is not controlled by any corporation. It is typically used for web application development.
PostgreSQL shares many of the same advantages of MySQL. It is easy to use, inexpensive, reliable, and has a large community of developers.
It also provides some additional features such as foreign key support without requiring complex configuration.
The main disadvantage of PostgreSQL is that it is slower in performance than other databases such as MySQL.
It is also less popular than MySQL, which makes it harder to come by hosts or service providers that offer managed PostgreSQL instances.

Oracle DB

Oracle DB is owned by the Oracle corporation, and the code is NOT open-source.
Oracle is used for large applications, particularly in the banking industry.
Most of the world’s top banks run Oracle applications because Oracle offers a powerful combination of technology and comprehensive, pre-integrated business applications, including key functionality built specifically for banks.
The main disadvantage of using Oracle is that it is not free to use like its open-source competitors and can be quite expensive.

SQL Server

SQL Server is owned by Microsoft. Like Oracle DB, the code is also NOT open-source.
SQL Server is mainly used by large enterprise applications.
The major difference between Oracle DB and SQL Server is that SQL Server only supports Windows OS.
Microsoft offers a free entry level version called Express, but can become very expensive as you scale your application.

Steps in creating and using a (relational) database.

1. Design the strucure of relations, and the attributes of those relations. Known as the Schema.
2. Load inital data.
3. Query and modify data. Repeat.

Creating Relations (Tables) in SQL

CREATE TABLE College(name string, state char(2), enrollment integer);

A statement is text that the database recognizes as a valid command. Statements always end in a semi-colon.
Let's break down the components of a statement:

Create Table is a clause.
Clauses perform specific tasks in SQL.
By convention, clauses are written in capital letters. Clauses can also be referred to as commands.

College refers to the name of the table that the command is applied to.

(name string, state char(2), enrollment integer); is a parameter.
A parameter is a list of columns, data-type's, or value's that are passed to a clause as an argument.

Here, the parameter is a list of column names and the associated data-type.
The structure of SQL statements vary.
The number of lines used do not matter.

Inserting Rows (Tuples) into Table

INSERT INTO College(name, state, enrollment) VALUES("Sonoma State University", 'CA', 8668);
INSERT INTO College(name, state, enrollment) VALUES("Oregon State University", 'OR', 28886);

INSERT INTO is a clause. Clauses perform specific tasks in SQL.

College(name, state, enrollment) are the parameters that the following VALUES will inserted into. In that order.

VALUES is a clause. Clauses perform specific tasks in SQL.

("Sonoma State University", 'CA', 8668); is a parameter identifying the values being inserted.

SELECT Statements

SELECT name FROM College;

This SELECT statement returns all data in the name column/attribute of the College table.

SELECT statements are used to fetch data from a database.

1. SELECT is a clause that indicates that the statement is a query. You will use SELECT every time you query data from a database.

2. name specifies the column/attribute to query data from.

3. FROM College specifies the name of the table to query data from.

Multiple columns can be queried at once by separating column names with a comma.

You can also query data from all columns/attributes in a table with SELECT:

SELECT * FROM College;

* is a special wildcard character. It allows you to select every column in a table without having to name each one individually.

SELECT statements always return a new table, called the result set.

SELECT DISTINCT Statements

SELECT DISTINCT name FROM College;

SELECT DISTINCT is used to return unique values in the result set. It filters out all duplicate values.

1. SELECT DISTINCT specifies that the statement is going to be a query that returns unique values in the specified column(s).

2. genre is the name of the column to display in the result set.

3. FROM movies indicates the table name to query from.

Filtering the results of a query is an important skill in SQL. It is easier to see the different possible genres a movie can have after the data has been filtered, than to scan every row in the table.

WHERE Clause

WHERE is a clause that indicates you want to filter the result set to include only rows where the following condition is true.

Operators create a condition that can be evaluated as either true or false. Common operators used with the WHERE clause are:

= equals
!= not equals
> greater than
< less than
>= greater than or equal to
<= less than or equal to
AND combine multiple conditions in a WHERE clause .
OR evaluates each condition separately and if any of the conditions are true then the row is added to the result set.
LIKE is a special operator that can be used in a WHERE clause.
_ means you can substitute any individual character here without breaking the pattern.
BETWEEN filter the result set within a certain range.

LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column

The BETWEEN operator is used to filter the result set within a certain range. The values can be numbers, text or dates.

% is another wildcard character that can be used with LIKE. % is a wildcard character that matches zero or more missing letters in the pattern. You can use % both before and after a pattern.

A% matches all movies with names that begin with "A".
%a matches all movies that end with "a".

ORDER BY Clause

You can sort the results of your query using ORDER BY. Sorting the results often makes the data more useful and easier to analyze.

ORDER BY is a clause that indicates you want to sort the result set by a particular column either alphabetically or numerically.

DESC is a keyword in SQL that is used with ORDER BY to sort the results in descending order (high to low or Z-A).

ASC is a keyword in SQL that is used with ORDER BY to sort the results in ascending order (low to high or A-Z).

LIMIT Clause

Sometimes even filtered results can return thousands of rows in large databases. In these situations it becomes important to cap the number of rows in a result set.

LIMIT is a clause that lets you specify the maximum number of rows the result set will have.

UPDATE Statements

UPDATE College SET enrollment = 8700 WHERE name = "Sonoma State University";

The UPDATE statements edits a row in the table. You can use the UPDATE statements when you want to change existing records.

1. UPDATE is a clause that edits a row in the table.

2. name is the name of the table.

3. SET is a clause that indicates the column to edit. age is the name of the column that is going to be updated. 22 is the new value that is going to be inserted into the age column.

4. WHERE is a clause that indicates which row(s) to update with the new column value. Here the row with a 1 in the id column is the row that will have the age updated to 22.

ALTER TABLE Statements

ALTER TABLE College ADD COLUMN twitter;

1. ALTER TABLE is a clause that lets you make the specified changes.

2. College is the name of the table that is being changed.

3. ADD COLUMN is a clause that lets you add a new column to a table.

twitter is the name of the new column being added, TEXT is the data type for the new column.

4. Here, the rows that existed before the column was added have NULL values for twitter.

DELETE Statements

DELETE FROM College WHERE twitter IS NULL;

The DELETE FROM statement deletes one or more rows from a table. You can use the statement when you want to delete existing records.

DELETE FROM is a clause that lets you delete rows from a table.

College is the name of the table we want to delete rows from.

WHERE is a clause that lets you select which rows you want to delete. Here we want to delete all of the rows where the twitter_handle column IS NULL.

IS NULL is a condition in SQL that returns true when the value is NULL and false otherwise.

Aggregate Functions

Aggregate Functions compute a single result from a set of input values. For instance, when we need the sum or average of a particular column, we can use aggregate functions to quickly compute it for us.

The fastest way to calculate the number of rows in a table is to use the COUNT() function.

COUNT() is a function that takes the name of a column as an argument and counts the number of rows where the column is not NULL.

Aggregate Functions are more useful when they organize data into groups.

GROUP BY is a clause in SQL that is only used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.

It is usually helpful to SELECT the column you pass as an argument to GROUP BY.

SUM() is a function that takes the name of a column as an argument and returns the sum of all the values in that column.

MAX() is a function that takes the name of a column as an argument and returns the largest value in that column.

MIN() is a function that takes the name of a column as an argument and returns the smallest value in that column.

AVG() function works by taking a column name as an argument and returns the average value for that column.

ROUND() is a function that takes a column name and an integer as an argument. It rounds the values in the column to the number of decimal places specified by the integer.

Multiple Tables

Most of the time, data is distributed across multiple tables in the database.

Imagine a database with two tables, artists and albums. An artist can produce many different albums, and an album is produced by an artist.

The data in these tables are related to each other. Through SQL, we can write queries that combine data from multiple tables that are related to one another. This is one of the most powerful features of relational databases.

A PRIMARY KEY serves as a unique identifier for each row or record in a given table. The PRIMARY KEY is literally an id value for a record. We're going to use this value to connect artists to the albums they have produced.

By specifying that the id column is the PRIMARY KEY, SQL makes sure that:

None of the values in this column are NULL.
Value in this column is unique.

A table can not have more than one PRIMARY KEY column.

A FOREIGN KEY is a column that contains the PRIMARY KEY of another table in the database. We use FOREIGN KEY and PRIMARY KEY to connect rows in two different tables. One table's FOREIGN KEY holds the value of another table's PRIMARY KEY. Unlike PRIMARY KEY, FOREIGN KEY do not need to be unique and can be NULL.

One way to query multiple tables is to write a SELECT statement with multiple table names separated by a comma. This is also known as a cross join.

When querying more than one table, column names need to be specified by table_name.column_name.

Joins are used to combine rows from two or more tables. The most common type of join in SQL is an inner join.

INNER JOIN will combine rows from different tables if the join condition is true.

LEFT OUTER JOIN will return every row in the left table, and if the join condition is not met, NULL values are used to fill in the columns from the right table.

AS is a keyword in SQL that allows you to rename a column or table in the result set using an alias.

Table Transformation

While working with databases, we often need to transform data from one format to achieve a desired result. In SQL, this is often called data transformation or table transformation.

Subqueries, sometimes referred to as inner queries or nested queries, are used to transform table data by nesting one query within another query. Subqueries are used to complete an SQL transformation by nesting one query within another query.

A non-correlated subquery is a subquery that can be run independently of the outer query and be used to complete a multi-step transformation.

A correlated subquery is a subquery that cannot be run independently of the outer query. The order of operations in a correlated subquery is as follows:

1. A row is processed in the outer query.

2. Then, for that particular row in the outer query, the subquery is executed.

Unions allow us to utilize information from multiple tables in our queries. In this lesson, we’ll utilize data from an e-commerce store. Let’s explore the available data we’ll be using.

Sometimes, in order to answer certain questions based on data, we need to merge two tables together and then query the merged result. Perhaps we have two tables that contain information about products in an ecommerce store that we would like to combine.

There are two ways of doing this:

- Merge the rows, called a join.

- Merge the columns, called a union.

We'll focus on unions here. Union combines the result of two or more SELECT statements, using the following syntax:

SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values.

What if we wanted to allow duplicate values? We can do this by using the ALL keyword with UNION.

Alternative Database Management

Other models are the Hierarchical Model and Network Model. Some systems using these older architectures are still in use today in data centers with high data volume needs, or where existing systems are so complex and abstract, it would be cost-prohibitive to migrate to systems employing the relational model. Also of note are newer Object-Oriented Database.

Similar to the Parent -> Child Model and Organization Chart:

Extensible Markup Language (XML)

XML is a standard for data representation and exchange.

Its format is similar to HTML, where the elements contain data and the tag is the attribute (column) name.

XML is a method for putting structured data in a text file. XML is license-free, platform-independent and well-supported

<?XML VERSION="1.0" STANDALONE="no">
<BOOKSTORE>
<BOOK ISBN="0-13-861337-0" Price="$50">
<TITLE>A First Course in Database Systems</TITLE>
<AUTHORS>
<AUTHOR>
<FIRSTNAME>Jeffrey</FIRSTNAME>
<LASTNAME>Ullman</LASTNAME>
</AUTHOR>
<AUTHOR>
<FIRSTNAME>Jennifer</FIRSTNAME>
<LASTNAME>Widom</LASTNAME>
</AUTHOR>
</AUTHORS>
</BOOK>
<BOOK ISBN="0-13-040264-8" Price="$60">
<TITLE>Database System Implementation</TITLE>
<AUTHORS>
<AUTHOR>
<FIRSTNAME>Hector</FIRSTNAME>
<LASTNAME>Garcia-Molina</LASTNAME>
</AUTHOR>
<AUTHOR>
<FIRSTNAME>Jeffrey</FIRSTNAME>
<LASTNAME>Ullman</LASTNAME>
</AUTHOR>
<AUTHOR>
<FIRSTNAME>Jennifer</FIRSTNAME>
<LASTNAME>Widom</LASTNAME>
</AUTHOR>
</AUTHORS>
<REMARK>
Buy this book bundled with "A First Course", it's a great deal!
</REMARK>
</BOOK>
</BOOKSTORE>

"Well-Formed" XML

"Well-Formed" XML has been returned by the XML Parser. "Well-Formed XML" is any XML document that follows the basic rules.

The XML Parser ensures incoming XML meets structural requirements:

Single root element.
Matched tags, proper nesting. All tags must close. <Tag>...</Tag>
Unique attributes within elements.

"Well-formed" XML means the document is structed.

"Valid" XML

"Valid" XML is "Well-Formed" XML that also adheres to specific criteria.

"Valid" XML means the document is structed AND the attributes, etc, meet the schema. STRUCTURE + SCHEMA

Document Type Descriptor (DTD)

It is possible to define the schema for XML data (XSD), called a Document Type Descriptor (DTD).

A Document Type Descriptor is a grammar that describes the legal nesting of tags and attributes. It is specified at the top of the document, or in a separate file referenced at the top of the document.

<!DOCTYPE Bookstore [
<!ELEMENT BOOKSTORE (BOOK*, AUTHOR*)>
<!ELEMENT BOOK (TITLE, REMARK?)>
<!ATTLIST BOOK ISBN ID Price CDATA Authors IDREFS>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT REMARK (#PCDATA | BOOKREF)+)>
<!ELEMENT BOOKREF>
<!ATTLIST BOOKREF book IDREF>
<!ELEMENT AUTHOR (FIRSTNAME, LASTNAME)>
<!ATTLIST AUTHOR Ident ID>
<!ELEMENT FIRSTNAME (#PCDATA)>
<!ELEMENT LASTNAME (#PCDATA)>
]>

DTD/XSD vs. None

	Relational	XML
Structure	Tables.	Hierarchical.
Schema	Fixed in Advance.	Flexable, "Self-describing".
Queries	Simple and easy languages.	Not so much.
Ordering	None.	Implied.
Implementation	Native.	Add-on.

JavaScript Object Notation (JSON)

JSON is a format for storing and exchanging data.

When exchanging data between a browser and a server, the data can only be text.
It is easy for humans to read and write.
We can also convert any JSON received from the server into JavaScript objects.
It is easy for machines to parse and generate.
JSON is text, and we can convert any JavaScript object into JSON, and send it to the server.
It is based on a subset of the JavaScript Programming Language. Standard ECMA-262 3rd Edition - Dec '99.

If you have data stored in a JavaScript object, you can convert the object into JSON, and send it to a server.

JSON makes it possible to store JavaScript objects as text.

When storing data, the data has to be a certain format. Regardless of where you store it, text is always one of the legal formats.

JSON is language independent!!!

JSON only uses the JavaScript syntax. That is it. JSON format is text only, and a sepetate concept from Javascript. Text can be read and used as a data format by any programming language.

Example - Storing JSON data in local storage.

//Storing data:
var object = { "name": "John", "age": 31, "city": "New York" };
object = JSON.stringify(object);
localStorage.setItem("testJSON", object);

//Retrieving data:
var text = localStorage.getItem("testJSON");
text = JSON.parse(text);
document.getElementById("demo").innerHTML = text.name;