Wednesday, November 4, 2009

What about the Non-Spatial Data? - A response to the City of Toronto Open Data Initiative

This is an email message received from my colleague, with her permission I am posting on this blog.

Subject: you ask what's missing from this site

I have browsed through the data catalogue at
with some interest. Here follow three comments re what's missing:

1. Any indication of the availability of prior editions of the datasets
(certainly none listed for the files I looked at). Eg the child services
file -- October 2009 is the date ascribed. What was the situation in
January 2009? What about October 2008? How has the recession affected the
availability of child services?  How does seasonal part-time employment
affect the demand for and availability of child services?

The place of worship file is dated 2006. Are there no prior editions of
this file? What about later versions of the file? Places of worship are
built, repurposed or destroyed (eg by fire) over time. But changes over
time cannot be studied without data over time.

Ie, what happens when a a later version of a file becomes available? Is
the prior version replaced, or does is still remain available? If not, why
not? If it is not available on-line, to whom does one turn for copies of
prior editions?

2. Any statement of a preservation policy. What procedures are in place to
manage the preservation of these data sets over time? What department is
responsible for the long-term preservation of these data? How frequently
are the data sets provided to whoever is responsible, after what period of
time? Software dependent formats are one of the first things to 'kill' a
dataset, once older formats can no longer be read by current software.
Access should not be mistaken for preservation.

3. Almost all data sets currently listed are spatially referenced data
sets, such as ESRI shapefiles, etc.What about other data, such as
anonymized microdata collected in the course of surveys conducted by the
City? For example, the Toronto employment survey, surveys conducted by the
TTC, by GO Transit, by the TDSB, attempts to survey the homeless etc. --
should these not also be included? The microdata from the Toronto
employment survey could, for example, be subjected to additional
statistical analyses than the simple descriptive statistics made available
in the reports available on the web site.

I am sure there are many other surveys that are conducted eg by City
Planning Division's Policy & Research section of which I am not even
aware, but which could be used for secondary statistical analysis so as to
improve our understanding of how the community functions, and the
informing of policy decisions. They may or may not be interesting from the
point of view of spatial analysis (GIS)  but are certainly interesting for
other types of analysis which do not involve spatial relationships.
Nonetheless, they should still be made available.

There are software solutions available that can, for example, present data
in a fashion that on the one hand supports the generation of descriptive
as well as inferential statistics from microdata, while on the other hand
ensuring that eg no table cell contains less than 5 (or some other
selected number) cases, so as to ensure privacy and confidentiality of

Laine G.M. Ruus
Data Library Service               

