R as GIS: Working Out Projections

In addition to the convenience that R offers for data cleaning, analysis and automating reporting, it also has the capacity to complete a variety of mapping (GIS) tasks.  Following are a few R snippets to help get started using the example of plotting schools (as a point file) within their catchment areas (boundaries described by polygons).

Shapefiles: Polygons

The package maptools has a couple of useful functions that will load ESRI shape files into R.  The first is readShapePoly() which, like read.csv, loads polygon .shp files into R as an object.  The second is proj4string() which defines the projection of the shape file:

#load the library


#load the School_Boundary.shp file into School.bndry object

School.bndry <- readShapePoly("School_Boundary") #loads School_Boundary.shp

#attach the projection used by the School_Boundary.shp file

proj4string(School.bndry) <- "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"

This last step, attaching the projection, requires you to:

  1. Know the projection that was used in the creation of the shape file
  2. Have this projection in proj4 format

If your .shp file also has a.prj file, you can use QGIS to get the proj4text by:

  • Opening the shapefile in QGIS
  • Right clicking on the boundary file and selecting properties
  • Clicking on “Metadata”
  • Scrolling to the bottom of the window you will find the proj4 text.  Copy and paste it into R

Working without projection information

If your .shp file does NOT have a .prj file things are a little more challenging.  Here are a few suggestions:

  1. If you have another file created by the same organization, check to see what projection it uses.  Organizations tend to be consistent in their file creation and will likely use the same projections from project-to-project and data product-to-data product.
  2. Go to the following website, zoom in to your location on the map and click on “PROJ4”: http://www.spatialreference.org/ref/epsg/26912/
  3. If you are in Southern Ontario, most of the files I have come across work with UTM NAD 83 zone 17 which is the following in proj4:
    +proj=utm +zone=17 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
  4. Brute Force: Open a shape file in QGIS that has a projection that is defined.  Open the “mystery” shape file and change it’s projection “on the fly” until you find one that lines up correctly with your first file.  Whichever one lines up is likely the projection you need to use.

Shapefiles: Points

The point file works in a similar manner.  This time the shape file is loaded using the maptools function readShapePoints()

School.point <- readShapePoints("School_points")   #loads School_points.shp

proj5string() is used again to define the projections:

proj4string(School.point) <- "+proj=utm +zone=17 +ellps=GRS80 +datum=NAD83 +units=m +no_defs "

Dealing with different projections

If the polygon and point file had the same projection I would be ready to prepare for creating a map.  However, the two files have different projections and need to be transformed to have the same projection (whichever one I choose).  In this instance, I’ll create two new objects that transform the files to NAD83 (both are done here for illustrative purposes but the boundary file is already in NAD83):

School.bndry.83 = spTransform(School.bndry,CRS("+init=epsg:26917"))

School.point.83 = spTransform(DDSB.geo.mp,CRS("+init=epsg:26917"))

Plotting the Maps with ggplot2

If you are familiar with the ggplot2 package you will be pleased to know that in addition to plotting histograms, scatterplots etc. it can also plot maps with all the same features. However, to take advantage of ggplot, there is one additional step required with the point file. It needs to be converted to a dataframe:

School.point.83.df <- as.data.frame(School.point.83) #Plotting a Map with ggplot2

With the shape files loaded, projections defined and the point file available as a dataframe the data is now ready to be plotted with ggplot:

ggplot(School.bndry,83) +            #Use the school boundary data 
     aes(long,lat, group=group) + 
     geom_polygon() +                #to draw the polygons 
     geom_path(color="white") +      #make the border of the polygons white 
     geom_point(data=DDSB.geo.mp.df, #add the points to the map 
                aes(X, Y, group=NULL, fill=NULL), 
                alpha=I(8/10)) #make each point blue with some transparency 

Full Code:


#load the library


#load the School_Boundary.shp file into School.bndry object
School.bndry <- readShapePoly("School_Boundary")

#attach the projection used by the School_Boundary.shp file
proj4string(School.bndry) <- "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"

School.point <- readShapePoints("School_points")   #loads School_points.shp
proj4string(School.point) <- "+proj=utm +zone=17 +ellps=GRS80 +datum=NAD83 +units=m +no_defs "

School.bndry.83 = spTransform(School.bndry,CRS("+init=epsg:26917"))
School.point.83 = spTransform(DDSB.geo.mp,CRS("+init=epsg:26917"))

School.point.83.df <- as.data.frame(School.point.83)

ggplot(School.bndry,83) +              #Use the school boundary data
     aes(long,lat, group=group) +
     geom_polygon() +                  #to draw the polygons
     geom_path(color="white") +        #make the border of the polygons white
     geom_point(data=DDSB.geo.mp.df,   #add the points to the map
           aes(X, Y, group=NULL, fill=NULL),
           alpha=I(8/10))      #make each point blue with some transparency
Posted in R | Tagged , | Leave a comment

AERO 2015: Making Shared Twitter Links Useful with R

On Friday December 4th, AERO hosted its annual fall conference at the Old Mill.  The speakers included:

  • Dr. Joe Kim, McMaster University, “The Science of Durable Learning”
  • Don Buchanan, Hamilton Wentworth  DSB , E-BEST, “Putting education in ‘educational’ apps: Lessons from the science of learning”
  • Dr. Daniel Ansari, Western University, “Building blocks of mathematical abilities: Evidence from brain and behaviour”

Twitter was again a staple at the conference (#AEROAOCE) with backroom discussions and sharing/extending resources and articles highlights by the speaker. As with previous years, an archive of the social media exchanges was created using Martin Hawksey’s TAGS 6.0  utility.  Twitterfall was also used as a live twitter feed so everyone could see what was resonating.

Although the compilation of tweets is straight forward, it is seldom in a format that I would share with other stakeholders.  To facilitate the cleaning process, I use a small R file that extracts the shared URLs and then expands them from the bit.ly or t.co formats. Following are the code snippets with descriptions of each step.  If you are more interested in the resources that were shared rather than the process to clean them, scroll down to the bottom of this post.

The following code is saved as the file twittercleaner.r   Each time I use it I change the name of the dataframes to reflect the conference tweets that have been compiled (in this case AERO).  The file begins by loading the three packages dplyr, stringr and long url.


Load the data file containing the tweets (a csv extract from the TAGS 6.0 archive):

AERO <- read.csv("C:/00_Data/AERO2015_Enduring_Learning.csv")

Identify the characters that may be contained in a url:

url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"

Use the stringr package to create a new column in the dataframe and extract the urls into it:

AERO$ContentURL <- str_extract(AERO$text, url_pattern)

Using dplyr to create a new dataframe and then (%>%),
remove the null values (!is.na) and then (%>%),only keep the column with the URLs:

AEROurl <- AERO %>%
filter(!is.na(ContentURL)) %>%
select (ContentURL)

Remove the duplicate URLs (keep unique URLs):

AEROurl <- unique(AEROurl$ContentURL)

Remove the rownames from the table:

attr(AEROurl, "rownames") <- NULL

Up to this point the URLs included in tweets have been shortened using bit.ly or t.co.  The following step uses the longurl package to expand the URLs:

AEROExpanded <- expand_urls(AEROurl, check=TRUE, warn=TRUE)

Remove URLs that could not be expanded (and result in a Null value):

AEROExpanded <- filter(AEROExpanded, !is.na(expanded_url))

Create a .csv file containing the extracted and expanded URLs:

write.csv(AEROurl, "C:/AEROurl.csv")

Full version:


AERO <- read.csv("C:/00_Data/AERO2015_Enduring_Learning.csv")

url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"

AERO$ContentURL <- str_extract(AERO$text, url_pattern)
AEROurl <- AERO %>%
        filter(!is.na(ContentURL)) %>%
        select (ContentURL)

AEROurl <- unique(AEROurl$ContentURL)
attr(AEROurl, "rownames") <- NULL

AEROExpanded <- expand_urls(AEROurl, check=TRUE, warn=TRUE)

AEROExpanded <- filter(AEROExpanded, !is.na(expanded_url))

write.csv(AEROurl, "C:/AEROurl.csv")

Results (sorted):

Posted in R, Twitter | Tagged , , | 1 Comment

Getting Started with Reproducible Research in R: What you need and Where to get it

You have been persuaded that engaging in Reproducible Research is worth your time and effort and now you want to get started. This post is a quick overview of the steps you need to take to install LaTeX and make it available to RStudio.

The following steps work as of R version 3.2.2,  RStudio version 0.99.489 and the proTeXt page dated:  2014/04/22 20:51:39

  • Install latest version of R: https://cran.r-project.org/bin/windows/base/
  • Install latest version of RStudio: https://www.rstudio.com/products/rstudio/download/
  • Download proTeXt (MiKTeX based): https://www.tug.org/protext/
    This file is very large (around a Gig) grab a coffee or start the download just before lunch.

    • Run the file and place the files on your Desktop – all of the files will be extracted onto your Desktop
    • In the directory the files were extracted to, click on the TEX Setup icon
      • Click on “Install” for MiKTeX
      • Accept the MiKTeX copying conditions
      • Click “Next”
      • Select “Complete MiKTeX”
      • Select:  “Anyone who uses this computer”
      • Select: preferred paper: “Letter”
      • Select: install missing packages on-the-fly: “Yes”
      • Click on “Next”
      • Click on “Start”
      • Leave the default installation directory and click “Next” – RStudio will look for it in the Program Files
    • Go to RStudio to set up the options:
      • Go to: Tools> Global Options >
      • Select: Weave Rnw files using:knitr
      • Select: Typeset LaTeX into PDF using: XeLaTeX
      • Select: Preview PDF after compile using: “Sumatra
Posted in Blog | Leave a comment

#AERA15 and Ontario School Board Research

The American Educational Research Association’s (AERA) 2015 Annual Meeting will be starting this Thursday, April 16th with more than 14,000 people participating in over 2,600 sessions. This year AERA is being held in Chicago Illinois with the theme

Toward Justice: Culture, Language, and Heritage
in Education Research and Praxis

As with previous years, researchers from Ontario will be sharing their work at this event:

  • (Thursday April 16, 12:00 to 1:30) 14.044 Teachers as Inquirers, Knowledge Generators – and Researchers? OISE, University of Toronto, Peel District School Board and George Brown College.
  • (Friday April 17, 4:05 to 5:35) 35.081 An Innovative University/School-Based Teacher Education Initiative: The Diverse School Initiative. Toronto District School Board.
  • (Saturday April 18, 10:35 to 12:05) 49.060 The Next Phase in Education Reform in Ontario: Excellence, Equity, Well-Being, and Public Confidence.
    • How a Province-Wide Public Consultation is Informing the Next Stage of Ontario’s System-wide Education Improvement Efforts. Ministry of Education
    • Aboriginal Education Evidence-Based Best Practices for Supporting First Nation, Metis, and Inuit Academic Achievement and Well-Being. Ontario Ministry of Education.
    • Changing the Educational Culture of the Home to Increase Student Success at School. OISE/University of Toronto and the Ontario Ministry of Education.
    • Success for All: Using Implementation Science to Build Conditions and Capacity to Support Student Mental Health in Ontario Schools. Ontario Ministry of Education.
  • (Saturday April 18, 2:45 to 4:15) 52.085 Building School District Capacity in Assessment for Learning: A Study on Teacher Learning in Assessment Through an Instructional Rounds Approach. Queen’s University and the Ottawa Carleton District School Board.
  • (Sunday April 19, 12:25 to 1:55) 63.027 The Listening Stone: Learning from the Ontario Ministry of Education’s First Nations, Metis, and Inuit Collaborative Inquiry. York University and the Ontario Ministry of Education.
  • (Sunday April 19, 4:05 to 5:35) 66.080
    • Equity and Accountability in a Major Metropolis: An Exploration of the Toronto District School Board. Toronto District School Board.
    • Deconstructing Intelligence-Based Special Education Exceptionalities. York University and Toronto District School Board.
    • Students’ Experience of Belonging and Exclusion across Toronto. Toronto District School Board.
    • Student, Family, School and Neighborhood Social Disorganization and Stratification and Educational Sorting of Postsecondary Pathways. Toronto District School Board.
    • Post Secondary Aspirations and Choices of Native and foreign-Born Adult Learners in the Toronto District School Board. York University and the Toronto District School Board.
  • (Monday April 20, 12:25 to 1:55) 73.061 Students’ Conscientious Technology Designs as Actions on Socio-scientific Issues. OISE and Peel District School Board.

You can explore the schedule in more detail with AERA’s online portal

If you are unable to attend these events in person, you can follow the discussions at #AERA15 on Twitter. As with previous years, an archive of tweets will be created and shared.


Posted in Blog, Twitter | Leave a comment

#OERS15 – Looking Closer at the Shared Links

Of the 2,654 tweets, 744 (29%) included a link. Of those 744, 204 links were retweeted 540 times.  Taking a closer look at the kinds of links that were posted, photos were the most frequent kind of link shared:

  • PhOERS Link typesoto (155)
  • Website (27)
  • Video (8)
  • Data viz (5)
  • Facebook (3)
  • PDF (3)
  • Spreadsheet (1)
  • OERS Agenda (1)
  • Cartoon (1)

Summarizing this information ended up being a more involved process than I had anticipated.  Before the links could be categorized the duplicates had to be removed, the links needed to be lengthened and the urls parsed.  Following is a description of the process:

  • Original tweets were compiled (retweets and truncated links were removed)
  • Links were lengthened:
  • Lengthened links were categorized according to the url content:
    • Youtube.com links indicate videos
    • Twitter/photo/1 links indicate photos
    • Links ending in .pdf indicate documents
    • Fb.me links indicate facebook posts
    • Some links needed to be expanded a second time and were fed back into the Bulk URL Checker, lengthened and categorized
    • Remaining links were reviewed and categorized
  • Links and categories were put into a pivot table and summarized (a copy of the excel sheet is available here)
Posted in Twitter | 1 Comment

#OERS15 – Promoting Well-being – Evidence to support Implementation of the Renewed Vision

Last week people from across Ontario came toToronto to attend the 10th annual Ontario Education Research Symposium. Over 400 people from schools, boards, post-secondary institutions, communities and networks met together to learn and share about well-being. The event featured:

  • a fire side chat with Jean Clinton,
  • keynote addresses by Dominic Richardson (UNICEF – Day 2) and Stuart Shanker (Day 3)
  • 34 workshops
  • a provocative speakers series
  • 2 panels
  • poster presentations that were accessible over all three days
  • networking opportunities to learn more about the work of colleagues
  • a group of students sharing their experience as researchers

Following is a quick overview of the tweets, comments on some twitter utilities and a look at a non-traditional approach to twitter analysis. An excel version of the archive is available here (includes some additional summary fields) and the live archive can be accessed here (the archive will be left open for a while to get a sense of how long #OERS15 material circulates).

#OERS15 Summary

As of February 16th:

  • 2,564 tweets from 409 people (an average of 6 tweets per person)
  • 269 people only tweeted once (66% of those who tweeted)
  • 19% of tweets were shared before (4%) and after (15%) OERS 2015
  • 233 tweets (9%) were shared on the first day
  • 1,200 tweets (47%) were shared on the second day
  • 637 (25%) on the third day
  • Top 4 Retweets:
    • 73RT @ResearchChat: Saw this while in Toronto Canada at #OERS15 RT @ShiftParadigm: PLEASE RETWEET —  Dissolving Boundaries @bnighrogain http…
    • 59RT @CarolCampbell4: Students as Researchers! #oers15 @OISENews @KNAER_RECRAE http://t.co/I67HFfRgbM
    • 22RT @CarolCampbell4: Interested in Ontario’s education research, evaluation, data & knowledge mobilization? Follow along #oers15 Tue-Thur @O…
    • 15  –  RT @DrJeanforkids: #oers15 john Dewey said “We don’t learn from experience, we learn from reflecting on experience ” have we learned that?
  • Top 4 Tweeters – Frequency of tweets before, during and after OERS 2015:

OERS15 - Top Tweeters


Visualizations – adding an element of gamification

OERS15 - TAGS tweet

The visualizations offered by TAGS 6.0 and Neoformix give two very different considerations to twitter activity. The visualizations in TAGS 6.0 give a cumulative view of twitter activity. The more you tweet over the course of the entire event, the higher you climb on the top tweeters, top hashtags and top conversationalists summaries.

However, activity on twitter is not uniform. People do not post 1.6 tweets every minute over the course of the entire event. Instead they post in fits and spurts as topics interest them and as their devices continue to be powered. At OERS one power tweeter was pushed off line when her device ran out of power (which gave the rest of us a brief moment of hope we might catch up).

Instead of an entire archive of tweets, the Neoformix Spot application visualizes the last 100 tweets which brings a different perspective to the top tweeters and topics lists. People and topics that might be overshadowed and overlooked in TAGS emerge on the Neoformix list.

Neoformix summary by timeline and by frequency of tweets:

OERS15 - Neoformix Top TweetersOERS15 - Neoformix Timeline

While the intention of twitter is to share information socially, the introduction of visualizations and ranking lists introduces an element of gaming. In this context, participants “win” solely by volume of tweets.  The more you tweet, re-tweet (RT) and modify tweet (MT), the higher you raise through the ranks.  However, nowhere in these visualizations does the quality of the information get addressed.

OERS15 - TweetReachAs I reflected on the reach of the #OERS15 after the symposia, I turned to TweetReach (www.tweetreach.com) to explore how much exposure the tweets received. The image to the left shows the activity of #OERS15 the day after the symposia. Not only has the post-OERS conversation continued, it has reached over 50,000 user accounts and had over 70,000 impressions. It is interesting to see that the tweeters with the most impact after the event are not the same people who were most active during the symposia:

  • @BlessTheTeacher tweeted once and had over 40,000 impressions from that single tweet
  • @researchimpact tweeted twice, retweeted once and had over 11,000 impressions.
  • @ShastaCH tweeted 14 times and had 4,000 impressions
  • @DP_math tweeted 11 times and had over 2,000 impressions

Non-traditional approach

To explore the quality of tweets I used an online app developed by Healey and Ramaswamy of NC State University’s “Sentiment Viz – Tweet Sentiment Visualization”  http://www.csc.ncsu.edu/faculty/healey/tweet_viz/tweet_app/ .

This app is easy to use with a single query box to enter a keyword/hashtag to be visualized. Once entered the application searches Twitter for tweets containing the keyword and processes the text according to how it relates to a sentiment dictionary. Each tweet is then placed in a circular grid which has been developed from Russell’s model of emotional affect (https://www2.bc.edu/~russeljm/publications/Russell1980.pdf)

Once visualized, the tweets create a scattergraph give an impression of the emotional content. An example of this graph can be found below which visualizes the tweets (373 tweets) on the last day of the Ontario Education Research Symposium. The dots on the left side that appear to have a negative affect were tweets that were shared during Dr. Shanker’s presentation which addressed the impact of stress on children.Sentiment Analysis Chart - OERS15 Feb 13

Geography of Twitter – still a difficult approach

Although OERS 2015 was held in Toronto I have been wondering about the geographic spread of #OERS15 tweets. Participants come from across the province to attend OERS but what does the tweet distribution look like and how far do the tweets travel beyond those attending?  It would be interesting to create a map showing the geographic distribution of participants that post to #OERS15. There is a utility called TweetsMap (http://tweepsmap.com/) which will create a map of twitter followers but so far I haven’t been able to find anything that begins with a hashtag and ends with a map of those using it.  If you know of one, please share it in the comments section.

Qualitative Analysis – a call for additional analysis

There is a lot of information contained in the tweet archive and I expect that, like the previous years, I will not not have enough time to dig deeper into the content. For those who are passionate about qualitative analysis, active on Twitter and have the time to dig deeper, it would be interesting to see how the tweets relate not just in terms of content but also styles of use: social interaction (invitations for lunch etc.), note taking (posting points from powerpoint or speaker comments), invitation to discussion (questions to provoke discussion), conversations (responding to the questions), value added material (providing supplemental material to the discussions and resources), and trolling (it’s only logical there would be a few trolling comments).  Perhaps there is a teacher or professor with a group of students learning about qualitative analysis that might take this analysis on? If you do, please let me know I’d be interested to learn about your approach, experience and findings.

Finally, to the organizers of the 2015 Ontario Education Research Symposium, many thanks for the opportunity to attend and connect.

Posted in Data Visualization, Twitter | Leave a comment

6 Forms of Bias That Weaken Your Research

Dr. John Ioannidis’ 2005 article “Why Most Published Research Findings are False” is a provocative reflection on how vulnerable research can be to bias. With citation in over 1,400 papers over the last 9 years (http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124Ioannidis has inspired and incited a lot of discussion. Although written for the medical community, the observations and concerns have as much application and importance to education as it does to medical research.

Finding what you want to see

Dr. Ioannidis defines bias as a “combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Selective or distorted reporting is a typical form of such bias. ” Following are 6 aspects of bias that Ioannidis explores (with examples) in his article:

  1. The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
  2. The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. This is particularly interesting given the controversy over the last year with John Hattie’s work.
  3. The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
  4. The greater the flexibility in designs, definitions, outcomes and analytical modes in a scientific field, the less likely the research findings are to be true.
  5. The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
  6. The hotter the scientific field (with more scientific teams involved), the less likely the research findings are to be true.

To help reflect on whether bias is an issue to be concerned about in your context, here are three questions to consider:

  • How many articles or projects report insignificant findings? Knowing what doesn’t work is as important and valuable as knowing what does work. If there are few (or no) insignificant findings, this could suggest data-mining/dredging (conducting analysis until positive results could be found), negative results were withheld, observations were vulnerable to the Hawthorn effect or observations were vulnerable to confirmation bias.
  • How consistent is the use and implementations of strategies and metrics across studies? Many of these considerations may be modified to align with the culture of a board. While this may strengthen the face validity of a study, it can weaken its generalizability or comparability to other studies. It is Dr. Ioannidis’ contention that the more meaningful findings emerge from research initiatives that are large scale, have multiple independent teams engaged in the inquiry and where consistency is ensured in the design, metrics and analysis.
  • Are there any dissenting or critical voices for strategies or findings? Ioannidis reflects that research in popular fields of study may be more vulnerable to “rapidly alternating extreme research claims and extremely opposite refutations” as research teams build their reputations by promoting their most positive results and use negative results as a challenge to other research teams.
Posted in Blog | Leave a comment