Database is a set of named relations (or tables).
Database Management System (DBMS ) provides efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.
DBMS may be programmed with frameworks. Ruby on Rails, Django etc. Often ran with middleware, file server applications, etc.
The amount of data needing to be stored everyday is growing faster then our computer RAM can store.
In information theory, data can be created, data can be read, data can be updated, and data can be deleted.
Atomicity requires that each transaction be "all or nothing": if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged.
An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
Consistency property ensures that any transaction will bring the database from one valid state to another.
Isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially.
Providing isolation is the main goal of concurrency control. Depending on the concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction.
Durability property ensures that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.
Pretty much what kind of data a piece of information. Called the type name.
It is important to understand that relational theory does not dictate what types are to be supported.
Nowadays provisions are expected to be available for user-defined data-types in addition to the built-in ones provided by the system.
Relational Model (RM) for database management is an approach to managing data using a structure (Data Model) and language.
|Column 0||Column 1||Column 2|
A table is an accepted visual representation of a relation.
|Attribute 0||Attribute 1||Attribute 2|
Each relation (table) has a set of named attributes (columns).
An attribute name might be "name" or "age".
Each attribute (column) has a type (or domain). Like an image, floating point number, integer, or a string.
|Attribute 0||Attribute 1||Attribute 2|
Each tuple (or rows) has a value for each attribute (column).
A tuple is basically the same thing as a row, except in an SQL DBMS, where the column values in a row are ordered.
(Tuples are NOT ordered. Instead, each attribute value is identified solely by the attribute name and NEVER by its ordinal position within the tuple).
Each relation/tuple can have a unqiue Key, an attribute (column) whose value is unique in each tuple (row). Or set of attributes whose combined values are unique.
When querying a database containing peoples name and age to return tuples (rows) with age (attribute):
The tuples (rows) with NULL age (attribute) will NOT be returned.
A relational database management system (RDBMS) is a program that lets you create, update, and administer a relational database.
Most relational database management systems use SQL to access the database.
There are more similarities than differences between the different RDBMS, but the SQL syntax may be slightly different depending on which RDBMS you are using.
CREATE TABLE College(name string, state char(2), enrollment integer);
A statement is text that the database recognizes as a valid command. Statements always end in a semi-colon.
Let's break down the components of a statement:
Create Table is a clause.
Clauses perform specific tasks in SQL.
By convention, clauses are written in capital letters. Clauses can also be referred to as commands.
College refers to the name of the table that the command is applied to.
(name string, state char(2), enrollment integer); is a parameter.
A parameter is a list of columns, data-type's, or value's that are passed to a clause as an argument.
Here, the parameter is a list of column names and the associated data-type.
The structure of SQL statements vary.
The number of lines used do not matter.
INSERT INTO College(name, state, enrollment) VALUES("Sonoma State University", 'CA', 8668);
INSERT INTO College(name, state, enrollment) VALUES("Oregon State University", 'OR', 28886);
INSERT INTO is a clause. Clauses perform specific tasks in SQL.
College(name, state, enrollment) are the parameters that the following VALUES will inserted into. In that order.
VALUES is a clause. Clauses perform specific tasks in SQL.
("Sonoma State University", 'CA', 8668); is a parameter identifying the values being inserted.
SELECT name FROM College;
This SELECT statement returns all data in the name column/attribute of the College table.
SELECT statements are used to fetch data from a database.
1. SELECT is a clause that indicates that the statement is a query. You will use SELECT every time you query data from a database.
2. name specifies the column/attribute to query data from.
3. FROM College specifies the name of the table to query data from.
Multiple columns can be queried at once by separating column names with a comma.
You can also query data from all columns/attributes in a table with SELECT:
SELECT * FROM College;
* is a special wildcard character. It allows you to select every column in a table without having to name each one individually.
SELECT statements always return a new table, called the result set.
SELECT DISTINCT name FROM College;
SELECT DISTINCT is used to return unique values in the result set. It filters out all duplicate values.
1. SELECT DISTINCT specifies that the statement is going to be a query that returns unique values in the specified column(s).
2. genre is the name of the column to display in the result set.
3. FROM movies indicates the table name to query from.
Filtering the results of a query is an important skill in SQL. It is easier to see the different possible genres a movie can have after the data has been filtered, than to scan every row in the table.
WHERE is a clause that indicates you want to filter the result set to include only rows where the following condition is true.
Operators create a condition that can be evaluated as either true or false. Common operators used with the WHERE clause are:
LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column
The BETWEEN operator is used to filter the result set within a certain range. The values can be numbers, text or dates.
% is another wildcard character that can be used with LIKE. % is a wildcard character that matches zero or more missing letters in the pattern. You can use % both before and after a pattern.
You can sort the results of your query using ORDER BY. Sorting the results often makes the data more useful and easier to analyze.
ORDER BY is a clause that indicates you want to sort the result set by a particular column either alphabetically or numerically.
DESC is a keyword in SQL that is used with ORDER BY to sort the results in descending order (high to low or Z-A).
ASC is a keyword in SQL that is used with ORDER BY to sort the results in ascending order (low to high or A-Z).
Sometimes even filtered results can return thousands of rows in large databases. In these situations it becomes important to cap the number of rows in a result set.
LIMIT is a clause that lets you specify the maximum number of rows the result set will have.
UPDATE College SET enrollment = 8700 WHERE name = "Sonoma State University";
The UPDATE statements edits a row in the table. You can use the UPDATE statements when you want to change existing records.
1. UPDATE is a clause that edits a row in the table.
2. name is the name of the table.
3. SET is a clause that indicates the column to edit. age is the name of the column that is going to be updated. 22 is the new value that is going to be inserted into the age column.
4. WHERE is a clause that indicates which row(s) to update with the new column value. Here the row with a 1 in the id column is the row that will have the age updated to 22.
ALTER TABLE College ADD COLUMN twitter;
1. ALTER TABLE is a clause that lets you make the specified changes.
2. College is the name of the table that is being changed.
3. ADD COLUMN is a clause that lets you add a new column to a table.
twitter is the name of the new column being added, TEXT is the data type for the new column.
4. Here, the rows that existed before the column was added have NULL values for twitter.
DELETE FROM College WHERE twitter IS NULL;
The DELETE FROM statement deletes one or more rows from a table. You can use the statement when you want to delete existing records.
DELETE FROM is a clause that lets you delete rows from a table.
College is the name of the table we want to delete rows from.
WHERE is a clause that lets you select which rows you want to delete. Here we want to delete all of the rows where the twitter_handle column IS NULL.
IS NULL is a condition in SQL that returns true when the value is NULL and false otherwise.
Aggregate Functions compute a single result from a set of input values. For instance, when we need the sum or average of a particular column, we can use aggregate functions to quickly compute it for us.
The fastest way to calculate the number of rows in a table is to use the COUNT() function.
COUNT() is a function that takes the name of a column as an argument and counts the number of rows where the column is not NULL.
Aggregate Functions are more useful when they organize data into groups.
GROUP BY is a clause in SQL that is only used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.
It is usually helpful to SELECT the column you pass as an argument to GROUP BY.
SUM() is a function that takes the name of a column as an argument and returns the sum of all the values in that column.
MAX() is a function that takes the name of a column as an argument and returns the largest value in that column.
MIN() is a function that takes the name of a column as an argument and returns the smallest value in that column.
AVG() function works by taking a column name as an argument and returns the average value for that column.
ROUND() is a function that takes a column name and an integer as an argument. It rounds the values in the column to the number of decimal places specified by the integer.
Most of the time, data is distributed across multiple tables in the database.
Imagine a database with two tables, artists and albums. An artist can produce many different albums, and an album is produced by an artist.
The data in these tables are related to each other. Through SQL, we can write queries that combine data from multiple tables that are related to one another. This is one of the most powerful features of relational databases.
A PRIMARY KEY serves as a unique identifier for each row or record in a given table. The PRIMARY KEY is literally an id value for a record. We're going to use this value to connect artists to the albums they have produced.
By specifying that the id column is the PRIMARY KEY, SQL makes sure that:
A table can not have more than one PRIMARY KEY column.
A FOREIGN KEY is a column that contains the PRIMARY KEY of another table in the database. We use FOREIGN KEY and PRIMARY KEY to connect rows in two different tables. One table's FOREIGN KEY holds the value of another table's PRIMARY KEY. Unlike PRIMARY KEY, FOREIGN KEY do not need to be unique and can be NULL.
One way to query multiple tables is to write a SELECT statement with multiple table names separated by a comma. This is also known as a cross join.
When querying more than one table, column names need to be specified by table_name.column_name.
Joins are used to combine rows from two or more tables. The most common type of join in SQL is an inner join.
INNER JOIN will combine rows from different tables if the join condition is true.
LEFT OUTER JOIN will return every row in the left table, and if the join condition is not met, NULL values are used to fill in the columns from the right table.
AS is a keyword in SQL that allows you to rename a column or table in the result set using an alias.
While working with databases, we often need to transform data from one format to achieve a desired result. In SQL, this is often called data transformation or table transformation.
Subqueries, sometimes referred to as inner queries or nested queries, are used to transform table data by nesting one query within another query. Subqueries are used to complete an SQL transformation by nesting one query within another query.
A non-correlated subquery is a subquery that can be run independently of the outer query and be used to complete a multi-step transformation.
A correlated subquery is a subquery that cannot be run independently of the outer query. The order of operations in a correlated subquery is as follows:
1. A row is processed in the outer query.
2. Then, for that particular row in the outer query, the subquery is executed.
Unions allow us to utilize information from multiple tables in our queries. In this lesson, we’ll utilize data from an e-commerce store. Let’s explore the available data we’ll be using.
Sometimes, in order to answer certain questions based on data, we need to merge two tables together and then query the merged result. Perhaps we have two tables that contain information about products in an ecommerce store that we would like to combine.
There are two ways of doing this:
- Merge the rows, called a join.
- Merge the columns, called a union.
We'll focus on unions here. Union combines the result of two or more SELECT statements, using the following syntax:
Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values.
What if we wanted to allow duplicate values? We can do this by using the ALL keyword with UNION.
Other models are the Hierarchical Model and Network Model. Some systems using these older architectures are still in use today in data centers with high data volume needs, or where existing systems are so complex and abstract, it would be cost-prohibitive to migrate to systems employing the relational model. Also of note are newer Object-Oriented Database.
Similar to the Parent -> Child Model and Organization Chart:
XML is a standard for data representation and exchange.
Its format is similar to HTML, where the elements contain data and the tag is the attribute (column) name.
XML is a method for putting structured data in a text file. XML is license-free, platform-independent and well-supported
"Well-Formed" XML has been returned by the XML Parser. "Well-Formed XML" is any XML document that follows the basic rules.
The XML Parser ensures incoming XML meets structural requirements:
"Well-formed" XML means the document is structed.
"Valid" XML is "Well-Formed" XML that also adheres to specific criteria.
"Valid" XML means the document is structed AND the attributes, etc, meet the schema. STRUCTURE + SCHEMA
It is possible to define the schema for XML data (XSD), called a Document Type Descriptor (DTD).
A Document Type Descriptor is a grammar that describes the legal nesting of tags and attributes. It is specified at the top of the document, or in a separate file referenced at the top of the document.
|Schema||Fixed in Advance.||Flexable, "Self-describing".|
|Queries||Simple and easy languages.||Not so much.|
JSON is a format for storing and exchanging data.
When storing data, the data has to be a certain format. Regardless of where you store it, text is always one of the legal formats.
JSON is language independent!!!