MIT Department of Urban Studies and Planning

Fall Mini-Classes on
Urban Analytics & Spatial Data Management

Between October 16 and November 15, 2012, we will offer two 4-session mini-classes:

MiniClass #1
MiniClass #2

Urban Analytics

Spatial Data Management and Visualization

Tuesdays, 6:30 - 8 pm
Room 37-312

Thursdays, 6:30 - 8 pm
Room 37-312

October 16, 23, Nov. 6, 13
Oct. 18, 25, Nov. 8, 15
Shortcut to current lab exercise: Shortcut to current lab exercise:

These mini-classes provide a hands-on, "quick start" look at some of the methods and tools that help urban planners capitalize on the new world of ubiquitous urban sensing and pervasive computing. Translating the new data streams into useful urban analytics and a deeper understanding of activity patterns and sustainability issues is exciting but challenging. The mini-classes cannot cram a whole course into a few sessions, but they can provide hands-on experience with 'real world' datasets that can introduce enough of the methods and tools to wet one's appetite and facilitate subsequent self-teaching. We assume students are familiar with at least 11.205 (GIS) and 11.220 (basic statistics) and basic data management using a tool such as MS-Access. (There will not be time in the mini-class to fill in those basics.)  The ‘urban analytics’ track will focus on spatial analysis and model estimation using R, ArcGIS, MS-Access and, possibly, OpenGeoDa.  The ‘spatial data’ track will focus on data management, indicator development, and visualization using PostgreSQL, PostGIS, MS-Access, ArcGIS and, possibly, Apache and QGIS. 

Each mini-class will be 4 sessions (one per week) spread over one month including 3 hours of lecture and discussion plus 4 hours of lab.  Both will be organized as hands-on workshops.  Prior to the first session, participants will read portions of a PhD dissertation authored by a recent Urban Information Systems (UIS) graduate.  The first session will be a lecture and demonstration of the data, methods, and results of specific empirical analyses in the dissertation.  The second session will be a hands-on lab with an exercise using the data and exploring alternative models, hypotheses, indicators, and visualizations.  The third session will be short presentations and discussion of results and interpretations developed by participants.  The fourth session will be a followup hands-on lab for further analysis and exploration. 

The PhD dissertation that will provide the data and point of departure for the mini-classes is by Dr. Mi Diao who received his PhD from DUSP in 2010 and is now an Assistant Professor in the Real Estate Department within the School of Design and Environment at the National University of Singapore.  His dissertation, “Sustainable Metropolitan Growth Strategies: Exploring the Role of the Built Environment" is available here: http://mit.edu/uis/theses/diaomi_dissertation_final_10sept.pdf
Before the first session, read at least the first 30 pages of the dissertation (i.e., the introduction and Chapter 1 explaining the measures of the built environment in metro Boston).

Prof. Diao's dissertation examines the interactions among indicators of built environment, demographics, land use, housing prices, and vehicle miles travelled  using spatially detailed data (including annual mileage estimates for several million Massachusetts private passenger vehicles geocoded by place of garaging).  Participants in the mini-classes will be given access to the data and tools used in this dissertation for the mini-class exercises and an individual or small group miniproject.

Mini-Class Requirements and Logistics

  1. Come to the first session on Oct. 16 at 6:30 pm in Room 37-312: Prior to this session, read (or at least skim!) the first 30 pages of Prof. Diao's dissertation Even you are interested only in MiniClass #2, you may find it helpful to attend the first half-hour of the Tuesday, Oct. 16, session.
  2. Sign up to take one or both mini-classes: We will have a signup sheet with a few questions at the first session. If you have *not* already sent me an email indicating your interest, send an email to jf@mit.edu indicating your name, Degree program, MIT email username and whether you want to take one or both of the mini-classes. Those who have notified me of their interest in advance will have priority on the lab machines if more than 24 people show up.
  3. Consider bringing your laptop to the sessions - especially if you already have ArcGIS and MS-Access installed and running on the MIT net (plus some room on disk to install a few more packages).
  4. Consider signing up for credit for one or both mini-classes: You may participate in the min-classes without necessarily registering for credit (subject to the availability of computers in the 37-312 computing lab). But, if you want to earn credit, you may register for 3 units of pass/fail credit for one or each of the mini-classes. In order to receive the 3 units of credit, you need to attend each of the 4 sessions, make a brief presentation of your intermediate work during the third session, and turn in a short report after the 4th session.

 Yes, it is okay, and can make sense, to attend the mini-classes as a listener.  It is set up so most of the learning comes from the labs and reworking  the data and models after reading the dissertation chapters.  But, the lectures and lab exercises can be useful without doing all the 'homework.'  As it turns out, you will be able to wait until after at least the first week before deciding whether to sign up for credit.  

Instructor

Prof. Joseph Ferreira, jf@mit.edu, Room 9-532, x3-7410


Getting Started (with Part #1 - Urban Analytics)

  1. Login to a Room 37-312 or CRON machine using your MITnet userid (or login to a personal laptop or workstation connected to the network).
  2. Access the miniclass webpage: http://mit.edu/11.523
  3. View a copy of the Instructor screen on your machine:
  4. Use Windows Explorer to copy the entire MiniClass data locker (180 MB)
  5. Open the saved ArcMAP document that you have copied into: c:\temp\data_ua\miniclass_startup_12oct16.mxd
  6. Open the data dictionary for Prof. Mi Diao's dissertation data in : c:\temp\data_dictionary.xls
  7. Open the PDF of Prof. Mi Diao's dissertation and dissertation defense slides:
  8. Review the table below describing the available data
  9. Between now and the end of next Tuesday's (Oct. 23) evening lab

Available Data

The 'DATA_UA' sub-directory of the MIT 11.523 class locker contains the following data:
Dataset (in ./data_ua/ ) Description
./miniclass_startup_12oct16 ArcMAP startup document
./data_dictionary.xls Data dictionary for MS-Access tables in vmt_data.mdb
./msaccess/vmt_data.mdb MS-Access database containing VMT, built environment, and demographic data tables
vmt_250m
VMT estimates for vehicles garaged within 250x250m grid cells
demographic_bg
Selected US 2000 Census characteristics for Eastern Mass block groups
built_environment_250m
Built environment indicators constructed for each 250x250m grid cell in Eastern Mass
ct_grid.txt
Cross-reference table of grid cell to community type (inner core, maturing suburb, regional center, and developing suburb)
./text/ct_grid.txt
text (csv) version of grid-cell-to-community-type cross reference table (for miniclass #2)
./shapefiles/all_stategrid.shp ArcGIS shapefile of 250x250 meter grid cells for Massachusetts produced by MassGIS
./shapefiles/ma_towns00.shp MassGIS shapefile of Mass town boundaries
./shapefiles/ma2000bg_stateplane.shp US 2000 Census block group polygon shapefile converted to Mass State Plane Coordinates (NAD83, mainland)
./shapefiles/TAZ_attri.shp Shapefile of Metro Boston Traffic Analysis Zones (provided by MAPC) with 'Community Type' attribute

Readings

These readings are available in the class locker:

  1. PDF of Mi Diao's PhD dissertation (Sept. 2010) - "Sustainable Metropolitan Growth Strategies: Exploring the Role of the Built Environment"
  2. PDF of slides from Mi Diao's dissertation defense (Aug. 2010) - diaomi_dissertation_defense_v6.pdf
  3. Link to review of VMT/Land-use/Transportation relationships (2012) - VMT_readings.txt
  4. R-script to load and explore some of the data in the MS-Access database: vmt_data.mdb:

Thursday, Oct. 18 - Getting Started with Part #2
Spatial Data Management and Visualization

The type of indicator development, spatial analysis and visualization that is illustrated in Prof. Mi Diao's dissertation requires a considerable amount of 'big data' management, integration, processing, and modeling. Toward this end, various software packages have different strengths and weaknesses and it is often useful to move data across packages and platforms in order to take advantage of different capabilities. For example, one might want to use a GIS such as ArcGIS for geoprocessing and mapping, an RDBMS such as PostgreSQL to provide network access to a multiuser database engine, a statistics package such as R to facilitate exploratory statistical analysis and modeling, and various tools and utilities to facilitate data interchange among the packages and to provide distributed access via networked browsers and web services.

In the Thursday (Part #2) sessions of the miniclass, we will examine some of the tools, workflow and best practices that help us mix and match the data and package capabilities effectively. Where possible and effective, we will use popular open source tools and utilities. The following table summarizes the task, tool, and strategy that we will employ to provide some hands-on experience with various steps involved in Prof. Mi Diao's spatial analyses of the VMT patterns in metro Boston and their relationship to built environment and demographic indicators.

Task Tool Strategy
Develop the built environment indicators from MassGIS-provided basemaps and data layers GIS and RDBMS Use geoprocessing tools in ArcGIS to overlay 250m grid cell layer with roads, transit lines, land use maps, etc. to compute indicators associated with each grid cell.
Develop demographic indicators from US 2000 census data. RDBMS Use relational database management tools (such as MS-Access or PostgreSQL) and SQL queries with US 2000 census block group data (SF3) to compute indicators associated with block groups.
Develop VMT estimates of annual mileage per vehicle / household / person for each 250m grid cell. RDBMS and GIS Process millions of vehicle records using RDBMS tools such as PostgreSQL to (a) estimate annual mileage from safety inspection data, (b) geocode owner addresses to 250m grid cells, (c) combine with MassGIS estimates of households in each grid cell, and (d) filter out unreliable data.
Develop built environment and demographic factors Statistics Use R for descriptive and exploratory data analysis of indicators and then use factor and principal component analysis tools to extract factors for use in modeling.
Build explanatory models of VMT as a function of the built environment and demographic factors in the neighborhood of each grid cell. Statistics Use regression models in R to estimate linear relationships with and without correcting for possible forms of spatial autocorrelation.
Interpret and present the results Web sservices, GIS, and visualization tools Use MassGIS and ESRI online mapping services to provide basemaps that help interpret the results; use R and GeoDa to study model residuals, multicollinearilty, non-linear transformations, submarket effects, etc.; publish results online for viewing in Google Earth, browsers, etc.

Getting Started (with Part #2 - Spatial Data Management & Visualization)

  1. Repeat the 'startup' steps from the first Session of Part #1:
    1. Login to a Room 37-312 or CRON machine using your MITnet userid (or login to a personal laptop or workstation connected to the network).
    2. Access the miniclass webpage: http://mit.edu/11.523
    3. View a copy of the Instructor screen on your machine:
      • go to http://join.me
      • type to 9 digits for the Instructor's screen into the 'JOIN someone's screen' dialog box
    4. Use Windows Explorer to copy the entire MiniClass data locker (180 MB)
      • From: Z:\course\11\11.523\data_ua
      • To: C:\TEMP
    5. Open the saved ArcMAP document that you have copied into: c:\temp\data_ua\miniclass_startup2_12oct16.mxd
      • Note, this ArcMAP document is slightly different from last time
        • The TAZ_attr layer is shaded based on 'community type'
        • Two 'web services' from MassGIS and ESRI (ArcGIS online) have been added since Tuesday
    6. Open the data dictionary for Prof. Mi Diao's dissertation data in : c:\temp\data_dictionary.xls
    7. Open the PDF of Prof. Mi Diao's dissertation
      • Z:\course\11\11.523\readings\diaomi_dissertation_final_10sept.pdf
      • We will focus on Chapters 1-3 of the dissertation and, in particular, on examining the data, maps, and models used to analyze the relationship between annual Vehicle Miles Traveled (VMT), neighborhood level built environment characteristics, and census block groups level demographic characteristics
      • His presentation slides may once again be useful: Z:\course\11\11.523\readings\diaomi_dissertation_defense_v6.pdf
    8. Review the table [above] describing the available data
  2. Between now and the end of next Thursday's (Oct. 25) evening lab

Readings

These readings have been added to in the class locker:

  1. TRR paper explaining the architecture of a GeoPortal that can facilitate distributed access to spatial data repositories and tools for urban modeling and research.: Ferreira, J., Mi Diao, Yi Zhu, Weifeng Li, and Shan Jiang, "Information Infrastructure for Research Collaboration in Land Use, transportation, and Environmental Planning," Transportation Research Record (2011).

 


Tuesday, Oct. 23 - Lab Exercise on Urban Analytics

Today's lab exercise will use ArcGIS, R, and MS-Access to develop some understanding and skill with urban analytics by recomputing and exploring some of the built environment and demographic indicators and the VMT models in Prof. Mi Diao's dissertation. We have developed some workarounds for the 32-bit and 64-bit. issues that arose last week with the computer lab machines. We have also expanded last week's sample R script. The lab exercise is available here: http://mit.edu/11.523/www/miniclass_lab1.html

There are three purposes to this Lab Exercise:

  1. Setup your computer to facilitate data exchange among GIS / Database / Statistics tools
  2. Reproduce some of the indicators and models in Prof. Mi Diao's PhD dissertation on VMT and the built environment
  3. Experiment with alternative indicators and models of the VMT / built environment relationship

After hearing a short presentation regarding today's lab exercise, you will spend the rest of the lab doing Parts 1 and 2 on the lab machines..


Thursday, Oct. 25 - Lab Exercise on Urban Analytics & Spatial Data

This Lab Exercise builds on the results of Lab Exercise #1 and emphasizes the use of ArcGIS to map model results and explore spatial patterns including spatial autocorrelatoin issues. You do not need to finish all the R-project exercises from Lab #1 before doing this Lab. We have saved the necessary results from Lab #1 in the class locker. But we will want to make use of R to explore some of the data and results. There are three purposes to this Lab Exercise:

  1. Setup your computer to facilitate data exchange among GIS / Database / Statistics tools
  2. Examine the spatial pattern of VMT, factors, and model results in Prof. Mi Diao's PhD dissertation on VMT and the built environment
  3. Experiment with GeoStatistics tools and visualization techniques that help in understanding our models of the VMT / built environment relationship

After hearing a short presentation regarding today's lab exercise, you will spend the rest of the lab doing Parts 1 and 2 on the lab machines..


Tuesday, Nov. 6 - Lab Exercises and Progress Reports on Urban Analytics

Today's session includes open-ended lab time for work on the previous lab exercises plus brief presentations and discussion of preliminary results by those taking the miniclass for credit (and by me!). If you have not already finished the introductory Part 2 of the first lab exercise (http://mit.edu/11.523/www/miniclass_lab1.html ) do that during the first half-hour. Ideally, we would like you to get beyond that part and have some experience with the more open-ended Part 3 of the lab exercise before we begin the 5-10 minute presentations and discussion of progress reports at 7 pm.

Fresh Data Table: There has been only one change in the shapefiles and access datasets since the last session. Recall that, during the Thursday session of the second week, we used a query named 'q_vmt_join4' to join together the VMT, built environment, demographic, and community type tables in order to create the table 't_vmt_join4' with all the data for each of the 53,250 grid cells that are needed to fit Prof. Mi Diao's VMT models using his built environment and demographic factors. This query and the new table have been saved in the MS-Access database, vmt_data.mdb, that is now saved in the class locker: .\data_ua\msaccess\vmt_data.mdb

New Shapefile: We have also saved a new shapefile, bos_vmt_grid.shp, in this part of the class locker: .\data_ua\shapefiles\bos_vmt_grid.shp. This shapefile was created by joining the shapefile, all_stategrid.shp, with the 250x250 meter grid cells to the t_vmt_join4 table (for only those 53250 grid cells that had meaningful data) and then exporting the result as a new shapefile, bos_vmt_grid.shp. Since last time, we added a new column to the attribute table of this shapefile. That new column is called, vmt_vin9, and is the ratio (vmt_tot_mv / vin_mv) = the average of annual miles travelled per vehicle for the 9 cell neighborhood centered on each grid cell. You will want to use this new version of bos_vmt_grid.shp with the added field in order for the new ArcMap document saved in the class locker to be able to find all the data that it expects.

New ArcMap document: We have saved a new ArcMap document that thematically maps the new shapefile, bos_vmt_grid.shp, and also includes several useful geospatial web services. We call this ArcMap document, miniclass_lab2_end.mxd, and it is saved in: .\data_ua\miniclass_lab2_end.mxd. There is also a version named, miniclass_lab2_end_10.0.mxd, that is saved in the format of ArcGIS version 10.0. This ArcMap document shows thematic maps of the actual and fitted grid cell estimates of VMT using different classification methods and breakpoints. It will be part of my "progress report" to investigate the spatial pattern of VMT in the actual data and in the model results. To assist us in exploring the results, I have added to web services to the ArcMap document. The ESRI 'arcgisonline' layers at the bottom of the table of contents provide street map, shaded relief, and imagery layers that can be slipped under our map at any time - by making a 'web service' request of the 'REST' server that ESRI provides (free of charge) to generate and deliver an symbolized image of the layer we want at the right scale and resolution for our viewing screen. Just keep the layers turned off most of the time since it takes a while for the web service request to be answered by the ESRI server. I have also added some of the web services that MassGIS provides (using the web mapping service, WMS, protocol that is one of the open and interoperable geoprocessing standards promulgated by the Open Geospatial Consortium (ww.opengeospatial.org). Try out the service that displays and symbolizes major roads.

Before starting your work today, you will want to copy the new datasets and ArcMap document to your local drive C:\temp\data_ua. Copying the entire directory is fine except that you may have created some new data of your own that you do not want to loose. So, be careful how you merge the new and old pieces. The only new part since the last session is the newly added field, vmt_vin9, in the bos_vmt_grid.shp. You can easily create that field on your own but you will need to name it vmt_vin9 as a 'double' data type in order for the new ArcMap document to open it correctly.

New R modeling illustration: The only other new material that I will illustrate today involves the use of R to transform variables before doing factor analysis (or principal component analysis) and to create dummy variables for the community types. Transforming variables in R is easy - just apply the transformation operator directly to the variable name when specifying the variables to include in a model. For a useful reference regarding the creation of 'dummy' variables, see this webpage from the Institute for Digital Research and Education at UCLA: http://www.ats.ucla.edu/STAT/r/modules/dummy_vars.htm

Here are some of the R scripts that I used for transformation and for creating dummy variables for community type:

First, let's recall what we did to plot the scattergram of percent high schooling, percent owner occupied, and percent poverty:

Now, let's plot the scattergram for the (natural) log of pct_owner vs. log of pct_pov. We need to be careful to omit (or redefine) those cases where either variable is zero (since the log of zero is undefined). For now, let's omit those cases, add columns to our table with the logs, and then do the scattergram: (How do you interpret the results? There are quite a few block groups where the percentage is zero? How should those be handled?)

Next, let's create dummy variables for the four types of communities in the metro Boston area and fit a model that uses 'developing suburbs' as the base with dummy variables for the other three community types:

Call:
   lm(formula = (vmt_tot_mv/vin_mv) ~ f1_nw + f2_connect + f3_transit + 
 f4_auto + f5_walk + f1_wealth + f2_kids + f3_work + factor(ct), 
 data = vmtall)
Residuals:
   Min       1Q   Median       3Q      Max 
   -11766.2   -708.5    -31.0    682.4  10769.6 
Coefficients:
   Estimate Std. Error t value Pr(>|t|) 
   (Intercept)                12808.836     12.449 1028.90   <2e-16 ***
   f1_nw                        646.712      7.331   88.21   <2e-16 ***
   f2_connect                  -363.985      6.724  -54.13   <2e-16 ***
   f3_transit                   786.002      6.857  114.63   <2e-16 ***
   f4_auto                      190.195      8.654   21.98   <2e-16 ***
   f5_walk                      -93.361      6.411  -14.56   <2e-16 ***
   f1_wealth                   -278.819     10.887  -25.61   <2e-16 ***
   f2_kids                      155.532      8.574   18.14   <2e-16 ***
   f3_work                      196.279      6.796   28.88   <2e-16 ***
   factor(ct)Inner Core        -932.983     30.341  -30.75   <2e-16 ***
   factor(ct)Maturing Suburbs  -214.550     14.930  -14.37   <2e-16 ***
   factor(ct)Regional Cities   -552.417     17.571  -31.44   <2e-16 ***
   ---
   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 1228 on 53238 degrees of freedom
   Multiple R-squared: 0.5305,     Adjusted R-squared: 0.5304 
   F-statistic:  5468 on 11 and 53238 DF,  p-value: < 2.2e-16

Thursday, Nov. 8 - Lab Exercises and Progress Reports on Urban Analytics & Visualization

Today's session includes open-ended lab time for work on the previous lab exercises plus brief presentations and discussion of preliminary results by those taking the miniclass for credit (and by me!). If you have not already finished the introductory Part 2 of the second lab exercise (http://mit.edu/11.523/www/miniclass_lab2.html ) do that during the first half-hour. Ideally, we would like you to get beyond that part and have some experience with the more open-ended Part 3 of the lab exercise before we begin the 5-10 minute presentations and discussion of progress reports at 7 pm. If you have already prepared your presentation and finished Parts 2 and 3 of the exercises, you may want to explore ways of estimating VMT per household (rather than VMT per vechicle), relating those estimates to BE and demographic factors, and interpolating VMT values for grid cells that had no observations - or too few observations to be reliable.

To assist in your use of the various tools to explore the built environment, demographic, and VMT relationships, we provide the following notes:

(1) New Datasets: Be sure to review the Nov. 6 notes (above) that explain recent additions to the data available in the class locker. These include adding the vmt_vin9 column (with 9-cell VMT-per-vin averages) to the bos_vmt_grid.shp shapefile and adding the ESRI arcgisonline and MassGIS web services to the ArcMap document, miniclass_lab2_end.mxd.

(2) Spatial Autocorrelation: The ArcMap document, miniclass_lab2_end.mxd, includes the actual, fitted, and residual values for each of the 53,250 grid cells used by Prof. Mi Diao and for which we estimated VMT in lab exercise #2 based on BE and demographic factors using ordinary least squares in R. The Lab #2 exercise showed thematic maps in ArcMap of some of the modeled and fitted values.You can zoom into areas with particularly low or high residuals and turn on the image, terrain, roads, and other 'web service' layers to explore other characteristics of badly fitted parts of the metro area. We also suggest that you experiment with OpenGeoDa. For your convenience, the Windows7 (64-bit) executable code and the documentation have been saved in the OpenGeoDa folder in our class locker. Copy this folder to c:\temp and run GeoDa from the local copy. (The executable is called, OpenGeoDa-Windows.exe) Skim through the GeoDa™ 0.9 User’s Guide (geoda093.pdf) to get a feel for the components and interface. Open the bos_vmt_grid.shp shapefile. When drawing thematic maps, be sure to turn 'off' the grid cell outlines so the borders do not entirely obscure the thematic shading. Turn them off by toggling 'Options / Color / outlines-visible' from the main toolbar in GeoDa.

Here are some examples of Moran's I and Lisa plots for the residuals, res1, of the ordinary least squares model. In all cases, I used a weights matrix that distance-weighted up to 25 proximate grid cells based on euclidean distance as the distance metric. (If there are no missing values, 25 cells would amount to the 5x5 cell block surrounding a grid cell.) The map in the upper left shades the residuals based on how many standard deviations the grid cell's residual (res1) is from zero. The Moran's I scatter plot in the upper right compares the residuals with nearby residuals and provides further evidence of spatial autocorrelation whereby nearby residuals tend to be high and low together. The two LISA plots show further evidence of local spatial autocorrelation. The signifcance map on the left shows how unusual it would be (i.e., the p's) to for each grid cell to have at least as much spatial autocorrelation by coincidence. The cluster map on the right shades high-high, low-low, etc. sets of neighboring grid cells that are estimated to have significant autocorrelation.

residuals res1 moran's i
lisa plot (significance) lissa plot (cluster)

Since many grid cells have no estimates, there are many missing values in the maps. Nevertheless, it is clear that many high-high grid cells are in the south central parts of metro Boston and many low-low areas appear to be in the northeast (near Marblehead and Beverly), along the East/West axis north of the Mass Turnpike, and north of downtown along the interstate 93 corridor. Low-low areas for 'res1' are where the residuals were negative together - that is, areas where res1 is negative because the actual VMT was much less than the model predicted. High-high areas are where the actual VMT was much higher than the model predicted. Perhaps the low-low areas were because we haven't yet considered commuter rail stations. Visualizing where the residuals tend to be high or low and where they are spatially clustered can help us identify other spatial characteristics they may influence VMT and, perhaps, could be quantified and included in the model.


Last modified: 6 November 2012 [jf]

Back to DUSP | CRON | MIT |