If you build it, they will come: 5 Blockbuster reasons your organization should have an R package

“There’s no crying in baseball “

There is a big learning curve to R.
It might make you cry.
It will probably make you angry.
It will definitely make you frustrated. And it is all worth it because:

  • It documents your work which supports transparency and accountability. This is an important step to building trust both inside and outside an organization.
  • It supports iterative and collaborative development. You can walk away from your work and pick it back up to continue or improve your work quickly.
  • It increases efficiency and timeliness of reporting/analysis. Once developed, R code can reduce the time for analysis and reporting from days and weeks to minutes and hours. In one project, the development of my code took weeks (same amount of time as it would have taken with traditional workflows) but in the next reporting period the time to produce those same updated reports (over 300 pdf’s) was just over 1 hour. That’s more time to devote to other important projects.

As summer arrives and the summer blockbuster movies become released, here are 5 blockbuster reasons to get started with R and consider creating an R package for your organization:

5: Managing file locations

“There’s no place like home.”

Having your network drive remapped is a pain in the neck. I’ve lived through the months of error messages that are discovered in code that needs to be updated with a new path. One of the first functions I created for our department was assigning the network drive locations for our common data sets. When the O drive is suddenly switched to the P drive and the drive hierarchy has a new departmental subdirectory, a quick update to the package results in scripts that continue to run without missing a beat. This is much faster than having to review every script to confirm that a file path is current or needs updating.

In this function (below), the network directory is returned when “P” is used and returns my local directory when any other character is used.

Function: datadrive

datadrive <- function(drive) {

datadrive.string <- ifelse(drive == “P”, return(“P:/Org/Department/MasterData/ “),



Use: datapath <- datadrive(“P”)


4. Corporate colours and aesthetic

“Here’s lookin at you kid .”

Having the hex codes (a six digit code the defines the red, green and blue combinations for each colour) for your organization’s colour scheme/brand available is incredibly handy for corporately aligned, aesthetically pleasing visualizations. The colour palettes I included in my organization’s package were built following a blog post by @drsimonj who walks through all steps required to build your own palettes: https://drsimonj.svbtle.com/creating-corporate-colour-palettes-for-ggplot2

Useful tools: Hex codes are used in R as a string with the prefix #.  For example, black is “#000000”, white is “#FFFFFF”.

Considering colour combinations: ColorBrewer2 is the go-to site for considering colour combinations for all kinds of data (sequential, diverging, qualitative): http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3

Finding a colour code: If you want to find the hex code for a colour, Adobe’s Color website is an excellent resource: https://color.adobe.com

Finding a tint or shade: If you would like to find shades and tints for a specific colour (useful for graphs) High Integrity Design has a useful online tool: https://highintegritydesign.com/tools/tinter-shader/


3: Standardizing your data extracts

“I’ll be back.”

Although dashboards are fantastic for monitoring systems, there is still a need for milestone reporting (i.e. a year-end report summarizing your key metrics). Having your RSQL bundled in a package streamlines the process and ensures that when you return to run the report the next month or year, the data structure will be the same.

Good: an .R file with code that extracts data from your information systems.

Better: turning the code into a function so that the process is easy to run.

Best: add the function to a package so you don’ t worry about where the source file is or which version is the most recent to use.


2: Keep your key metrics on hand at all times

“May the force be with you.”

If you have a set of key metrics it can be included as a table in your package.  This makes them portable and convenient to access when you are working offline.


1: Availability of definitions and keys

“Are you the Key Master?”

Working with data sets from several different information systems can result in many different naming conventions.  Including a master table in the package that aligns all the identifiers in the package is a big time saver and makes analysis much more straightforward.

  • Famous Movie Lines
    • “If you build it, they will come.” – Field of Dreams
    • “There’s no crying in baseball.” – A League of their own”
    • “There’s no place like home.” – Wizard of Oz
    • “Here’s lookin at you kid.” – Casablanca
    • “I’ll be back.” – Terminator
    • “May the Force be with you.” – Star Wars
    • “Are you the Key Master?” – Ghostbusters
Posted in R | Leave a comment

REQAO: An R package for EQAO data

Introducing REQAO, an R Package for those who work with EQAO data files. This package is a collection of functions to assist in the loading of files and the relabeling of values.  As additional functions are added and expanded, the most up-to-date version will be found at https://github.com/cconley/REQAO

What is EQAO?

EQAO (Education Quality and Accountability Office) is an independent agency that develops, administers, analyses and reports on student achievement across Ontario, Canada.  EQAO coordinates the administration of Reading, Writing and Mathematics assessments for students in Grades 3 and 6, Grade 9 Mathematics assessments for students enrolled in Grade 9 Academic and Applied Mathematics courses and an Ontario Secondary School Literacy Test (OSSLT) for students in Grade 10 English. EQAO also supports the administration of national and international assessments (TIMSS, PIRLS, PISA, PCAP, ICILS) in Ontario.

Each September, staff in school boards run analyses and produce public reports on the achievement of their students on these assessments.  This package supports the preparation of data for analysis.

This package currently has three functions:

SchoolLoad(grade, year, bident, board, datadir)

What it does: creates a tibble (previously known as a dataframe) of school summary achievement data on EQAO assessments.

Merge all the data files from an EQAO administration year into a single tibble.

Fill in the missing board name and bident (missing in the EQAO files) with the arguments that are passed to the function).

Create a new column with the year of the assessment. This makes it easier to work with multiple years of assessments.

Note: in EQAO files, the year as of June is used as the EQAO naming convention for each school-year.

How it is used:

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to construct the names of the data files.

year: this is a numeric value in the form YYYY. This is used to construct the names of the data files.

bident: this is the 5-digit number assigned by the Ministry of Education. This is used to construct the names of the data files.

board: this can be any string you choose in the form “ABCDEFG”. This is used to fill in the blank SchoolName value for the board summary row.

datadir: this is your data directory as a strong in the form “C:/directory/subdirectory”. This can be passed as a vector and is used to construct the string identifying the file and location.


SchoolTibble <- SchoolLoad(3, 2017, 12345, “Random DSB”, “C:/temp/data/”)

StudentLoad(grade, year, bident datadir)

What it does: creates a tibble of student achievement on EQAO assessments

Merge all the data files from an EQAO administration year into a single tibble.

Compile and relabel the IEP variables into a single, new, readable column (IEPcode).

Relabel values of demographic variables to a readable form (i.e. Gender, ELL, Eligibility, Program, FI etc. as applicable).

How it is used:

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to construct the names of the data files.

year: this is a numeric value in the form YYYY. This is used to construct the names of the data files.

bident: this is the 5-digit number assigned by the Ministry of Education. This is used to construct the names of the data files.

datadir: this is your data directory, as a string, in the form “C:/directory/subdirectory”. This can be passed as a vector and is used to construct the string identifying the file and location.


NewTbl <- StudentLoad(6, 2017, 12345, “C:/temp/data/”)

AchieveLabel(x, grade, type)

What it does: Modifies tibbles made using the StudentLoad() function:

Relabel values of all achievement variables contained in the assessment file (i.e. ROverallLevel, Prior_G6_MOverallLevel, OSSLTOutcome, etc.) as character labels (i.e. “Level 1”, “Level 3”, “Exempt”) or numeric labels (i.e. 0, 1, 2, 3, 4, NA). Note, when values are relabeled as numeric, all non-level values are relabeled as NA

How it is used: Arguments used in the function

x: the name of the tibble that was created using the StudentLoad() function.

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to identify naming conventions used by each version of the EQAO assessments.

type: this is one of two strings:

“char”: this will change the values to more meaningful, readable labels (i.e. “Level 1”, “Exempt”, “Pending”

“num”: this will change the values to numeric where Levels 0 to 4 are coded as 0, 1, 2, 3 and 4 and all other versions of no-data (withheld, pending, absent, exempt etc.) are relabeled as NA. This is equivalent to calculating “Fully Participating”.


NewStdntTbl <- AchieveLabel(OldStdntTbl, 6, “char”)


Installation and use

The REQAO package can be installed using the following function from the devtools package:


Once installed, REQAO can be used like any other package:


Posted in R | Leave a comment

The 9 Lives of a Photo (Pictures are Data)

With the availability of mobile devices in the classroom, photos are increasingly being used by educator teams to document learning. The blessing and the curse quickly becomes apparent as hundreds of pictures accumulate and clog albums and directories. The “perfect picture” just does not look meaningful a week or two later when the activity and student interaction are a fading memory. The longer you wait to review and annotate the photos, the less likely you will use them as part of your professional dialogue and assessment process.

Here are 9 ways one photo can enhance your professional practice.

When the picture is…

…sent home, it is parent communication

…reviewed by the teacher, it is assessment

…catalogued according to who is demonstrating learning, it is tracking and monitoring

…collected and considered over time, it is documentation of learning and growth

…posted on the bulletin board outside the classroom, it is making learning and milestones visible to the other students and teachers in the school

…posted in the classroom, it is making learning and milestones visible to the other students in the class

…shared with the student, it is student reflection

…shared on Pinterest, twitter or through email with colleagues, it is shared practice

…shared with the administrator, it is documentation of practice

An important motto that applies in many areas of life is:

“Just because you can doesn’t mean you should.”

This is an important mindset to have when using photographs to document learning.  Following are three considerations for every image you capture and plan to use:

  • Privacy
    Has the family agreed to have school photos shared? Have you asked the family if they would like to receive photos of their child in the class? Do you really need to show the child’s face in the photo or can you just show the work that is being done, perhaps with a hand or finger pointing to something of significance?  The PIM Toolkit (Privacy and Information Management Toolkit) has further direction: Full names of students and other personal information and/or photographs do not appear on work displayed in the school, on websites and/or in newsletters.” For more privacy details, you can check the toolkit out here: https://www.pimedu.org/files/toolkit/PIMtoolkit.pdf  Steps to maintaining privacy can come before other considerations and intentions.
  • Purposeful:
    Is it a “cute” picture or are you capturing a meaningful product, process or interaction? “Cute” makes a great family photo but in the classroom the focus is on purposeful documentation (and is within the privacy considerations discussed previously). Parents see their children every day, but they don’t get to see them engaged in the classroom.  This is your chance to invite them in to your classroom space.
  • Positive:
    With the privacy considerations addressed and the purpose of the photo determined, how would the child feel about their work being displayed or shared in some form? Even if a child cannot be identified in a photo, sharing or displaying a photo that a child feels anxious about undermines your classroom culture and relationship. Find another way to display this kind of learning (perhaps create a sample or mock version as an illustration).
Posted in Blog | Tagged , , , | Leave a comment

Finding the Path

When secondary students complete their courses and accumulate credits, they leave a trail of data behind them.  As these same students consider their course options for the the next year, they are making choices about the pathway they will follow.  These pathways become easier to see in the following visualization.

These pathways are easier to see with the following visualization that I adapted from the work of Kerry Rodden for an education data context.  This visualization was prepared for Karen Robson, Department of Sociology McMaster University, who will be presenting at the Canadian Sociological Association (part of the Congress of the Humanities and Social Sciences) on “Practical Advice on Communicating Sociological Research”, May 30th.  The  fictional cohort data was constructed and presented in a way that would allow the user to explore student pathways:

  • The inner most circle represents courses that have been completed in Grade 9.
  • Each larger circle describes the following year: grade 10, grade 11 and grade 12.
  • Each segment within a circle describes a course type (academic, applied, university, college, open, workplace etc.) and is labeled in the legend. “End” means the fictional student was no longer in the school at that point.
  • As your mouse hovers over each segment of a ring, the percentage of all students following that pathway is calculated in the centre.
  • A breadcrumb trail is created at the top, highlighting the pathway you have selected.
  • The larger the segment, the greater the number of fictional students that are represented.

Click on the image to play with the interactive version.

                       Exploring Student Pathways

It would be interesting to see what proportions emerge from real cohort data or how those proportions might differ according to student characteristics.

If one advances confidently in the direction of his dreams, and endeavors to live the life which he has imagined, he will meet with a success unexpected in common hours.

Henry David Thoreau

Posted in Data Visualization | Leave a comment

Making EQAO data easyR to work with

Academic data, just like every other data set, usually consumes more time with cleaning and reshaping than analyzing and visualizing. One of the appeals of R is the ability to re-use code and it is in that spirit that I’ve written the following function – to make my life (and hopefully the lives of a few other education researcher s) a little easier with basic re-coding tasks.

IEPs are a common category for grouping records and in EQAO records they reside in separate columns.  The following function works with any dataframe that contains all of the SIF columns (works with both Primary and Junior records): IEP.EQAO(dataframe)

The dataframe is returned with a new column that identifies, in plain language, the IEP that was assigned to each record.

IEP.EQAO <- function(x){
 x$IEP <- paste0(x$SIF_IEP, 
x$IEP <- ifelse(x$IEP == "0000000000000", "No IEP",
          ifelse(x$IEP == "1000000000000", "IEP no IPRC",
           ifelse(x$IEP == "1100000000000", "Behaviour",
            ifelse(x$IEP == "1010000000000", "Autism",
             ifelse(x$IEP == "1001000000000", "Deaf",
              ifelse(x$IEP == "1000100000000", "Language",
               ifelse(x$IEP == "1000010000000", "Speech",
                ifelse(x$IEP == "1000001000000","Learning",
                 ifelse(x$IEP == "1000000100000","Giftedness",
                  ifelse(x$IEP == "1000000010000","MildIntellectual",
                   ifelse(x$IEP == "1000000001000","Developmental",
                    ifelse(x$IEP == "1000000000100","Physical",
                     ifelse(x$IEP == "1000000000010","Blind",
                      ifelse(x$IEP == "1000000000001","Multiple","BadCode")

The code is also available on Github here and is the beginning of what I hope will collaboratively evolve into an EQAO Package.

Future development will include re-coding for Secondary data files. Any comments or interest in collaboration are always welcome.

*Update May 29: Code has been modified to work with any ISD file (3,6,9,10) going back to 2011.

Posted in R | Tagged , | 2 Comments