Full lesson: https://librarycarpentry.org/lc-sql/
Slides: https://www.angelazoss.com/intro-to-sql/sql
Structured Query Language, or SQL (sometimes pronounced “sequel”)
command that lets you look up data in a database or make calculations based on columns
51 different journals published during 2015
SELECT column_name_1 FROM table name;
SELECT column_name_1, column_name_2 FROM table name;
SELECT * FROM table name;
SELECT DISTINCT column_name_1 FROM table name;
SELECT DISTINCT column_name_1, column_name_2 FROM table name;
SELECT * FROM table name ORDER BY column_name_1;
SELECT * FROM table name ORDER BY column_name_1 ASC;
SELECT * FROM table name ORDER BY column_name_1 DESC;
SELECT * FROM table name ORDER BY column_name_1 DESC, column_name_2 ASC;
Write a query that returns:
Title, First_Author, ISSNs and Citation_Count
Title
First_Author
ISSNs
Citation_Count
from the articles table, ordered by the top cited article and alphabetically by title.
SELECT * FROM table name WHERE column_name_1 = ‘some value’;
SELECT * FROM table name WHERE (column_name_1 = ‘some value’) AND (column_name_2 = ‘some other value’);
SELECT * FROM table name WHERE (column_name_1 = ‘some value’) OR (column_name_2 = ‘some other value’);
Can also use NOT to reverse either condition.
NOT
SELECT * FROM table name WHERE column_name_1 LIKE ‘%part of value%’;
Additional comparisons: IN, BETWEEN...AND, IS NULL
IN
BETWEEN...AND
IS NULL
Title, First_Author, Subjects, ISSNs, Month and Year
Subjects
Month
Year
for all papers where Subjects contains “computer” and that have more than 8 citations.
MAX
MIN
AVG
COUNT
SUM
SELECT column_name_1, AVG(column_name_2) FROM table name GROUP BY column_name_1;
Write a query using an aggregate function that returns the number of article titles per ISSNs, sorted by title count in descending order.
Which ISSN has the most titles?
(Hint: choose which aggregate function to use. It is one of the common aggregate functions: MAX, MIN, AVG, COUNT, SUM.)
SELECT column_name_1, AVG(column_name_2) FROM table name GROUP BY column_name_1 HAVING AVG(column_name_2) >= 10;
Write a query that returns, from the articles table, the average Citation_Count for each journal ISSN but only for the journals with 5 or more citations on average.
articles
After SELECT, in addition to listing column names, we can write calculations using arithmetic operators:
SELECT
+
-
*
/
%
SQRT
SELECT * FROM table_name_1 JOIN table_name_2 ON table_name_1.column_name = table_name_2.column_name;
SELECT * FROM table_name_1 JOIN table_name_2 USING (column_name);
SQL Join Types Explained Visually
SELECT table_name_1.column_name_1, table_name_2.column_name_2, table_name_2.column_name_3 FROM table_name_1 JOIN table_name_2 ON table_name_1.column_name_1 = table_name_2.column_name_1;
Write a query that JOINS the articles and journals tables and that returns the Journal_Title, total number of articles published and average number of citations for every journal ISSN.
JOINS
journals
Journal_Title
Write a query that returns the Journal_Title, Publisher name, and number of articles published, ordered by number of articles in descending order.
Publisher
Spreadsheets, sometimes referred to as tabular data or flat files, are an easy way to display data organized in columns and rows.
Spreadsheets can make data gathering easier but they can also lead to messy data.
In the figure below, can you identify where inconsistencies in the data have been introduced?
ID
When a primary key from one table connects to a field in a second table, the field in the second table is called a “foreign key.”
Are there other tables and relationships you can create to further normalize the data and avoid inconsistencies?
For this exercise, you can either use pencil/pen and paper to draw new tables and relationships or modify the previous ERD using dbdiagram.io.
CREATE TABLE table_name(col_name_1 data_type_1, col_name_2 data_type_2…);
DROP TABLE table_name;
INSERT INTO “table_name” VALUES (value_1,value_2,…);
UPDATE table_name SET column_name = value WHERE id = record_id;
DELETE FROM table_name WHERE column_name = value;
Caution: deleting records can break the links between tables. You should also delete any records that point to the deleted records.
Write an SQL statement to add the journal “New Journal of Physics” (ISSNs & ISSNs: 1367-2630; publisher: “Institute of Physics (IOP)”) to the table journals. You need to add the publisher “IOP” to the table publishers as well.
publishers
SQLite has several administrative commands that aren’t part of the SQL standard. One of them is .dump, which prints the SQL commands needed to re-create the database. Another is .read, which reads a file created by .dump and restores the database.
.dump
.read
A colleague of yours thinks that storing dump files (which are text) in version control is a good way to track and manage changes to the database. What are the pros and cons of this approach? (Hint: records aren’t stored in any particular order.)