Massachusetts Institute of Technology
Department of Urban Studies and Planning


11.521: Spatial Database Management and Advanced Geographic Information Systems

11.523: Fundamentals of Spatial Database Management

Lab 1: Introduction to SQL, GIS, and Lab Setup

  6 February 2018


Due: Tuesday, February 13, 2018

Motivation:

Urban sensing, pervasive computing, and the widespread availability of georeferenced data and geoprocessing tools have greatly expanded the capacity of urban planners to observe, analyze, visualize, and model the spatial deployment of people, infrastructure, and social and economic activity and interactions. While the availability of relevant data has expanded exponentially, good analysis is still hard and appropriate data integration and interpretation requires considerable technical skill, contextual understanding, and artful choices. For any particular urban planning problem, relevant data are often voluminous (but incomplete), dispersed across multiple agencies and sources, and rarely cross-referencable through simple one-to-one relationships. This subject (the full-semester 11.521, and half-semester 11.523) provides hands-on experience with such urban planning and spatial analysis settings using basic GIS and relational database management tools in a distributed computing environment.

Lab Objectives:

This lab exercise has two primary purposes: (1) to acquaint you with the computing environment and basic GIS and relational database tools that we will be using this semester, and (2) to introduce you to relational joins and database queries using the structured query language (SQL).  For many of the 11.523 lab exercises, we use parcel data, sales data, and building footprints for East Boston to estimate and visualize the spatial variation in land value and some of the amenity and accessibility characteristics that influence land value. Today, we will get started by using ArcScene to visualize the built form of East Boston by extruding building footprints to the building heights that were estimated using LIDAR data. Developing useful estimates of land value involves more analysis and computation since zoning, building type, point-of-interest locations, and other factors can complicate the estimation and visualization. We will get to this in later labs.

If someone else has already assembled, cleaned, and cross-referenced all the data and tools that you need, then 'urban analytics' does not look too hard. But urban planners tend to ask questions that have not already been answered, and much of the task involves finding, filtering, manipulating, and merging data acquired from several different sources. For sizeable datasets, these steps require a deeper understanding of data management tools, location tagging, and data structure. To get started with some of these skills, we will focus on a 'relational' data model of linked tables and the use of 'structured query language' (SQL) to query relational databases. For these SQL queries, we will use a set of seven, small tables that are similar in structure and content to the types of information about property ownership and land use that are maintained by most local assessing and planning offices.  We will use MS-Access, Excel, and, eventually, PostgresSQL to examine this small 'toy' database of illustrative parcel, zoning, and tax tables.

For those who would like to brush up on your familiarity with ArcGIS, we suggest the first few lab exercises from past versions of 11.188 and/or 11.205 or 11.520 (the prerequisites for this course) and/or some of the 'virtual campus' tutorials that ESRI provides. The first two 11.188 lab exercises from last Spring are here:
Exercise #1: http://mit.edu/11.188/www17/labs/lab1
Exercise #2: http://mit.edu/11.188/www17/labs/lab2
The exercises from 11.205 and 11.520 are available on Stellar: https://stellar.mit.edu/S/course/11/fa15/11.205/index.html
You can obtain passwords for the virtual campus exercises from the MIT library staff in the GIS lab location in Rotch Library. For more information of GIS assistance from the MIT Libraries see: http://libraries.mit.edu/gis

Additional Background Material:

Additional in-class lab notes will be available here: http://mit.edu/11.521/www/labs/lab1/11.521_day1_notes18.html

For an example of a DUSP dissertation that analyzed voluminous, spatially detailed datasets using many of the spatial data management and advanced GIS methods learned in this class, see Mi Diao's PhD dissertation, "Sustainable Metropolitan Growth Strategies: Exploring the Role of the Built Environment" (Sept. 2010).


Part I: Athena Space & ArcGIS Example

Class Lockers and WinAthena lab machines: We are going to use our class and personal network lockers within the 'Athena' space on the andrew network file system (AFS). All data we will use in the first half of the class (that is, pre-project) will be stored in /afs/athena.mit.edu/course/11/11.521/data. From WinAthena PCs on campus, this class data locker can be accessed via drive Z: [Navigate the directory tree to get to: Z:\athena.mit.edu\course\11\11.521\data]. For your convenience, you can use Map-network-drive option in Windows Explorer to mount the class locker as drive K:. (Right-click 'This-PC' in Windows Explorer and choose 'map-network-drive'.)

We suggest that you save your own work on a USB drive or in your personal Athena locker (which is automatically attached as drive I: when you log into the lab machines). The class exercises can also be done on your own laptop or desktop computer (both on and off campus) if your machine is properly configured (with ArcGIS, PostgreSQL, AFS, Kerberos, and - if off campus - a VPN connection). All this software is free for MIT students but the installation can be tricky. We will provide instructions and limited support for such personal computer connections.

ARCGIS REVIEW

Running ArcGIS on CRON or WinAthena machines around campus can be painfully slow unless you take steps to insure that default files and datasets are *not* located on your network locker space. In general, we recommend that you copy all data, saved documents, and workspaces to a local hard drive before beginning lab exercises. (There will be exceptions later for large datasets stored in database engines on the net.) Rob Goodspeed (DUSP, PhD student, 2014) wrote a useful set of notes (for 11.205/11.520 a few years ago) that explains how to setup your workstation to improve the ArcGIS performance. This document is available via Stellar on the class website [https://stellar.mit.edu/S/course/11/fa12/11.205/courseMaterial/topics/topic15/resource/11.520_Setup_Guide/11.520_Setup_Guide.pdf]

and is also reproduced, for your convenience, in our class locker: 11.520_Setup_Guide.pdf

 

As a warmup for next week's ArcGIS exercises (in Lab #2), we will examine East Boston parcel data and building footprints that has been saved in an 'eboston05' folder within the class data locker.

Copy FROM this folder in the class locker: Z:\athena.mit.edu/course/11/11.521/data/eboston05

Copy TO this local drive: C:\temp\

 

Within this local copy of the 'eboston05' folder is an ArcMap document, eboston05_lab2start.mxd. Double click on this document to open ArcMap and load the East Boston parcels and building footprints. Your screen should look something like this:



East Boston startup map with web services

 

The ebos_parcels05 shapefile shows the boundaries of East Boston parcels as provided by the Boston Assessing Department for 2005. The ebos_buildings02 shapefile contains the building footprints for East Boston parcels as of 2002. The ArcMap table of contents window also shows several MassGIS and ESRI layers that come from 'web mapping services' rather than locally stored datasets. These 'layers' are images, appropriately scaled and registered to the coordinate system of the viewing window, that provide visually useful background (shade relief) and foreground (symbolized roads) that help us interpret our East Boston data.

 

Neither of the East Boston shapefiles contains many attributes - mostly IDs that can be linked to other data tables. We will link the parcel shapefile to assessing data in next week's lab. But the building footprint shapefile does have estimates of the ground and roof heights of each building. These estimates were developed from LIDAR data that was funded by Homeland Security around 2002 and supported by local state agencies. MassGIS, the state GIS office, provides downloads of the building footprint layer with the height attributes from their public website:
http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/

 

This map document, eboston05_lab2start.mxd, has already altered the default ArcMap settings in accordance with Rob Goodspeed's setup notes so that your default workspace and geodatabase point to folders in C:\temp.  However, ArcGIS may have reverted to other defaults if it cannot find the named folder and file geodatabase in C:\temp. You may want to use Rob's notes to adjust the ArcMap workspace and file geodatabase so they point to places in C:\temp (or a local USB drive).  If you do not, ArcMap operations will be slower than they need to be since ArcMap will continually be reading and writing temporary files into a network drive rather than a local drive.

 

Now let's use ArcScene to visualize the East Boston buildings in 3D. Use Start/ArcGIS/ArcScene-10.4.1 to open the ArcScene window. Use the 'Add Data' icon (or File/add-data menu option) to add the ebos_parcels05 and ebos_buildings02 shapefiles into the table of contents of ArcScene. (You can also Drag and Drop the ebos_buildings02 and ebos_parcels05 shapefiles from the 'Catalog' window - if it is open - into the table of contents in the ArcScene window.) Make sure that the ebos_buildings02 layer is 'on top of' the parcel layer. Now right-click the ebos_buildings02 layer and choose 'properties.' We can click the 'Extrusion' tab and tell ArcScene to use the roof heights to extrude the building footprints up to the roof height. Since the ground height varies across East Boston, we should specify the extrusion to be [ROOF] - [GND], that is the difference between the roof and ground heights. Click OK after setting this extrusion expression and play with the visualization window to get used to the interface for 3D interaction. Often, users will want to exaggerate the height to facilitate visualization. You may want to reset the extrusion to be 5*[ROOF] - [GND] in order to get more height variation. The result should look something like this:

East Boston building extrusions

 

This is all that we are going to do with ArcScene and the East Boston data today. We are a long way from developing realistic building and land values from assessing data and we may also want to use a surface model that does not assume 'flatland' for East Boston. Once we have done more analysis to estimate parcel and building values we can tag our data with those estimates or generate 'hot spot' envelopes that help visualize spatial patterns. For now, just take a screen shot of your ArcScene window and submit that as part of your results for this exercise.

 

In addition, for those who have not taken 11.188, 11.205 or 11.520, we require that you complete lab 1 of 11.188 (http://mit.edu/11.188/www17/labs/lab1) and hand in the final layout you create. You may also want to review lab2 of 11.188 - See: \afs\athena.mit.edu\course\11\11.188\www13\labs\lab2\lab2.html - which is the same location as: http://mit.edu/11.188/www17/labs/lab2

PART II: Database Queries using MS-Access, Excel, and PostgreSQL

To facilitate learning basic relational database concepts, we will utilize a set of seven small tables that encode land use and ownership information about PARCELS, OWNERS, SALES, FIRES, PERMITS, TAX, and ZONING. This is a 'toy' database with only a few rows in each table but with a table format (schema) and values that are typical of parcel-related data that are maintained by municipalities and counties. An MS-Access version of this database contains the seven tables and is called 11.521_lab1.mdb. It is available in the class data locker as 11.521_lab1.mdb (using the older 2002-2006 format for MS-Access databases for increased compatibility). The location of the file in the AFS network file system is /afs/athena.mit.edu/course/11/11.521/data/11.521_lab1.mdb. You can find this database by navigating down the directory tree on Drive Z: to reach Z:\athena.mit.edu\course\11\11.521\data\11.521_lab1.mdb or down the shorter path if you have mounted the class locker as Drive K:. You can also download and then run a copy of this database by going to http://mit.edu/11.521/data/11.521_lab1.mdb

 

Here is an Entity-Relationship diagram of the seven tables (prepared in MS-Access) together with links to more detailed listings of the entries in the seven tables.

 

Relationships among the Seven Tables
The 'Toy' Parcel Database
parcels_schema
  • Each row of the PARCELS table contains the parcel identifiers, address, owner, land use, and lot size of each parcel.
  • Each row of the OWNERS table lists the name and (mailing) address of owner. The owner number (ownernum) in the OWNERS table matches the owner number (onum) in the PARCELS table.
  • Each row of the SALES, TAX, FIRES, PERMITS, and ZONING tables list transaction information relevant to particular parcels.
  • The SALES, TAX, FIRES, AND ZONING tables are linked to the PARCELS table via the parcel ID (parcelid).
  • The PERMITS table is linked to the PARCELS table by the combination of two columns (labeled pid and wpb).
  • Follow these links for a listing of the rows in each table:

    FIRES | OWNERS | PARCELS | PERMITS | SALES | TAX | ZONING

     

     

     

    For our introductory database query exercises we will use versions of this same 'toy' parcel database stored in different formats: the MS-Access database just mentioned and one stored in PostgreSQL. Since CRON machines do not have the tool that allows us to manage and develop PostgreSQL tables (pgAdmin), We will access them from our browser using phpPgAdmin via http://urbansim.mit.edu/phppgadmin. phpPgAdmin is a web-based administration tool that can connect to PostgreSQL using a standard query language (SQL). Later on, we will use a client-side application (pgAdmin) to create, manage, and access our PostgreSQL tables.

    Using MS-Access with the 'toy' parcel database

    Steps

    Option 1: (retrieve the database through the file system)

    1a. Attach the class locker as Drive K (as explained above) or navigate the AFS folders via Drive Z to find the DATA sub-directory in the class locker.

    1b. Copy the MS-Access database file, 11.521_lab1.mdb, to somewhere on your local drive that is writeable by you. (e.g., C:\TEMP). NOTE that, on WinAthena machines, the DESKTOP is NOT stored on a local drive but is redirected to your Athena locker (on Drive I:). For the small 'toy' parcel dataset, it will not matter whether you use a truly local drive or your Athena locker. However, when we work with larger datasets and ArcGIS tools, having your data and scratch space on network storage will slow down the processing and greatly decrease the reliability of the operations.

    Option 2: (download the database from a browser)

    2a. Enter this web address into a browser: http://mit.edu/11.521/data/11.521_lab1.mdb

    2b. Save the downloaded file to somewhere on your local drive that is writeable by you. (e.g., C:\TEMP). NOTE that, on WinAthena machines, the DESKTOP is NOT stored on a local drive but is redirected to your Athena locker (on Drive I:). Do not open the file directly since the browser will save its temporary copy on a network drive by default.

    B. Double-click on your local copy of 11.521_lab1.mdb to open this database file in MS-Access

    Once MS-Access opens the database, you may see a security warning: "Certain content in the database has been disabled." This warning is because the database contains saved queries that could conceivably be a malicious program. Click 'options' and choose 'enable this content.'

     

    You will see listed in the table of contents each of the 7 tables listed in the above diagram. Double-click on the tables to take a look. You will also find the entity-relationship diagram via the Tools/Relationships menu.

     

    Highlight the 'Queries' choice under 'objects' in the main table-of-contents window and notice the two saved queries: q_midsize_parcels and q_parcels_with_fires. If the queries are not listed, left-click the 'Tables' heading and choose 'all access objects." The first query 'q_midsize_parcels' uses the parcels table to find all parcels whose square footage is between 10,000 and 100,000 square feet. The second query 'q_parcels_with_fires' joins the parcels table to the fires table (using PARCELID) and lists all the parcels that have had a fire along with the fire date and estimated loss.

     

    Double-click on the first query to see the results. Then, choose View/Design to see the GUI for building and editing the query. Also, choose View/SQL to take a look at the SQL view to see the more-or-less standardized text that describes the query using Structured Query Language syntax. Do the same for the other query. SQL is the lingua franca of client-server communication with online database engines and sits under the covers of database applications including most web services and online geoprocessing.

     

    Modify the first query to show all parcels with less than 100,000 square feet (not just those bigger than 10,000 or with missing data). Modify the second query to omit the pid, wpb, add1, and add2 fields and to omit parcels whose land use is equal to 'R3'.

     

    Cut and paste the tables produced by your modified queries into an editor of your choice. Label them appropriately, add your name and them save this text document. Turn it in as part of your Lab #1 exercise.

     

    Using Excel with the 'toy' parcel database

    You can access databases saved in MS-Access databases (or in PostgreSQL and other relational database management systems, such as Oracle) from many desktop tools such as Excel 

     

    Before trying to access the data from Excel, make a second local copy of 11.521_lab1.mdb and rename it to lab1.mdb (since some versions of Excel will choke on MS-Access database names with more than 8 characters!).

     

    Open Excel and choose Data/From-Access/Select-Table and select MS-Access Database when the dialog box opens up. Navigate to your local copy of lab1.mdb and import the 'parcels' table from the MS-Access database. [Beware, on older version of Excel, that the MS-Query tools used by Excel to interface with the database are fragile. Not only will they break if the MS-Access database name is too long, but it will choke on the path name if it contains special characters such as the period '.' in 11.521!] When you try to open lab1.mdb, you may see a warning message, "Microsoft Office has identified a potential security concern." This is because a malicious program might conceivably be disguised as a MS-Access database. You will need to click 'enable' to get past this message and then select the MS-Access tables or queries that you wish to import into MS-Excel. Select the 'parcels' table.

     

    Use the business graphics tools of Excel to plot a bar graph showing the square feet of each parcel in the parcels table. Label the graph appropriately and include a jpeg, png copy or a pdf file of the graph in what you turn in for lab1. (A PDF file of your exercise results is also okay.)

     

    Excel (and MS-Access and ArcGIS) can also access data directly from our PostgreSQL server (and most other relational database servers such as SQL Server, Oracle Servers, MySql, etc.). We will use this capability in later exercises.

     

    Using phpPgAdmin to run queries on PostgreSQL tables

    We do not need to access the 'toy' dataset from PostgreSQL as part of today's lab exercise. However, the same seven tables are stored in a PostgreSQL database engine online and we will use this database engine (with much larger tables) later in the semester. Here are the steps needed to connect to our PostgreSQL database engine using phpPgAdmin from any browser on WinAthena computers in our lab (Room W31-301) or from other CRON machines running Windows/Mac/Ubuntu (or from any other device connected to the internet if you have the needed access permission and local software). phpPgAdmin is a web-based administration tool that allows querying data stored on PostgreSQL database servers.


    There are many other client applications, such as pgAdmin that allow queries on data stored on PostgreSQL (and other relational database engines such as Oracle).  Some of these provide more elaborate SQL-based graphical interfaces to relational database engines, including SQuirreL, an open source, java-based universal SQL client that is available at http://www.squirrelsql.org and can run on Macs.


    However, to get comfortable with the basics of SQL queries and relational thinking, we will start with our browser-based interface to PostgreSQL.

     

    Since most of you are unlikely to be familiar with PostgreSQL, we spell out some of the tips and tricks for using the application and connecting to it from your browser.


    Steps


    (A) After you are logged in to one of the lab machines, open your preferred web browser and go to http://cronpgsql.mit.edu/phppgadmin. Click on the PostgreSQL server and you will be prompted to fill in a username and password. During the first class, you are going to use a guest account—user name “sqlguest”. The password will be given by the instructor in class. During the first lab, we will collect your MIT information in order to create a personal PostgreSQL account for you.


    In subsequent labs you will use your own account to access the class databases. You will also be able to access PostgreSQL tables from a browser on your machines or from pgAdmin III, a PostgreSQL administration tool that allows the user to create databases and manage manage users and tables, and their attributes.


    image101

     

     

    (B)  The interface of PostgreSQL on a web browser via phpPgAdmin looks like the following (In this case we are using Mozilla Firefox)


    image102


    (C) Navigate to "toyparcels" on the list that appears on the left side of your browser. This is the database where our MS-Access Tables are located in PostgreSQL. Open Schemas, then public and click on Tables and you will see the name of all seven tables. Click on "Browse" to look at each of them.


    image103


     

     

    Running Basic SQL Select Statement Queries

    At this point, you have all the connections in place. You only need to open the query tool by clicking on "SQL" on the top-left corner of your browser. Once open, make sure the Database is set to "toyparcels" and Schema is set to "public" (there are other schemas that allow us to keep tables, sequences, etc. protected, but this time we are only using tables that are public).


    Now let’s use our small, hypothetical parcel and property tax database to get a feel for how we can use phpPgAdmin to run simple queries on PostgreSQL.  Elsewhere on the class help pages, we describe this sample parcel database and the basic structure of the SELECT command.  Type in some simple queries, or you can use cut-and-paste to grab pieces of queries shown in the table below. You will probably find it helpful to edit the queries in a separate window with a text editor such as 'Notepad'. When you are finished editing a query, you can run it by cutting and pasting the text into the SQL window. Many servers require that each each SQL statement ends with a semi-colon ';' before pressing the 'Enter' key, but this is not required for PostgreSQL (although it is advised). Then click execute and the result table will appear on your browser's window.


    Try out each of the following SQL examples of SELECT statements:

    - Do you understand what each query is doing?

    - Does the SQL syntax make sense? [In addition to the select...from...where structure, note the use of wildcards and the treatment of missing values, arithmetic expressions, aggregate functions such as min(tax) and count(*) ]

     

    Query

    Tasks

    SQL Statements

    1

    The simplest query listing the TAX table

    SELECT * FROM tax;

    2

    A query with an expression and a column alias

    SELECT parcelid, (landval + bldval) AS tot_val, tax
      FROM tax;

    3

    A simple query that aggregates (groups) over a whole table

    SELECT COUNT(*), MIN(tax), MAX(tax), AVG(tax)
      FROM tax;

    4

    This example uses a literal string in the SELECT list

    SELECT parcelid, 'Total prop value is', (landval + bldval) as tax
      FROM tax;

    5

    A simple query using the "IS NULL" syntax to list only those parcels with at least 20,000 sq. ft. and no missing value

    SELECT *
      FROM parcels
     WHERE sqft >= 20000 OR sqft IS NULL;

    6

    Compare the results of the query 5 with query 6. Why are they different?

    SELECT *
      FROM parcels
     WHERE sqft >= 20000 OR sqft = NULL;

    7

    A query to find values within a certain range

    SELECT *
      FROM parcels
     WHERE sqft >= 10000 AND sqft <= 100000;

    8

    Another way of writing the query above using the
    -- "BETWEEN" keyword

    SELECT *
      FROM parcels
     WHERE sqft BETWEEN 10000 AND 100000;

    9

    A simple join between PARCELS and FIRES. Note that only those parcels which had fires are listed.

    SELECT p.*, f.estloss
      FROM parcels p, fires f
     WHERE p.parcelid = f.parcelid;

    10

    A slight variation of query 9.

      SELECT p.parcelid, p.sqft, p.landuse, f.estloss
        FROM parcels p, fires f
     WHERE p.parcelid = f.parcelid
    ORDER BY f.estloss;


    PRINTOUT

    We cannot save or print the SQL statement and the query result directly on phpPgAdmin. However, you can copy your SQL statements and paste them into a text editor such as Notepad, and then print from there.

     

    'Print' out the SQL statements and the query results for Query 3, 7 and 10 in the above table of queries. Actually, there is no need to generate printed hard copy. All the exercises will be submitted electronically to the class homework locker on Stellar. CRON machines have a 'printer' named 'Adobe PDF' that, instead of printing a document, saves it in PDF format within a folder of your choosing. All Athena clusters have access to physical printers via a system called 'Pharos' that controls the routing of your printout. See this website: http://kb.mit.edu/confluence/display/istcontrib/Pharos+Public+Student+Printing+at+MIT for further information about how to print from Athena machines. See this website: http://cronlasso.mit.edu/cron/p.lasso?t=4:2:0 for further information about printing from CRON machines. Basically, after 'printing' you must release a print job for printing on a particular machine.

     

    At the moment, there are no printing facilities in or near our lab in Room W31-301. If you wish to save files for printing later in CRON or some other facility, be sure to copy the files to your Athena locker before leaving the lab. In any event, there is no need to print anything today. CRON also provides large format plotting capabilities.

     

    Your file(s) containing your lab exercise #1 can be uploaded to Stellar later this week when we have provided access to you to the class homework turn in area on Stellar.

     

    PART IV: WHAT TO TURN IN

    This first lab exercise is intended to acquaint you with the computing environment used for the class and with the tools available to move data among various desktop applications and database engines. Don't worry if you are not familiar enough at the outset with the various tools and the steps needed to make the connections. We will discuss them a bit in today's lab and again in subsequent lectures and labs.

     

    You should turn in PDF copies of:

    - The three 'printouts' described in Part II above from

    1. ArcScene,
    2. SQL queries and query results formatted as tables using MS-Access *or* Postgres,
    3. Excel.


    - The map you created in the ArcGIS review exercises (if you haven’t already taken 11.205, 11.520 or 11.188).


    Make sure that your name, MIT email address and the date are on the submission. If you wish to include the PDFs as part of a Word document that you submit as a single file, that is okay too. If you wish to submit homework using some other file format, please check with the Instructor regarding the particular format. The lab assignment should be uploaded to the 11.521 Stellar website (https://stellar.mit.edu/S/course/11/sp18/11.521) by the start of next week's lab exercise at 4pm Tuesday, February 13, 2018. (Note, you may not be able to access the stellar site until a few days from now when you have been added to the class list.)


     Home | Syllabus | Lectures | Labs | CRON | MIT
     last modified: 4 Feb. 2018 (rb)