If you build it, they will come: 5 Blockbuster reasons your organization should have an R package

“There’s no crying in baseball “

There is a big learning curve to R.
It might make you cry.
It will probably make you angry.
It will definitely make you frustrated. And it is all worth it because:

  • It documents your work which supports transparency and accountability. This is an important step to building trust both inside and outside an organization
  • It supports iterative and collaborative development. You can walk away from your work and pick it back up to continue or improve your work quickly.
  • It increases efficiency and timeliness of reporting/analysis. Once developed, R code can reduce the time for analysis and reporting from days and weeks to minutes and hours. In one project, the development of my code took weeks (same amount of time as it would have taken with traditional workflows) but in the next reporting period the time to produce those same updated reports (over 300 pdf’s) was just over 1 hour. That’s more time to devote to other important projects.

As summer arrives and the summer blockbuster movies become released, here are 5 blockbuster reasons to get started with R and consider creating an R package for your organization:

5: Managing file locations

“There’s no place like home.”

Having your network drive remapped is a pain in the neck. I’ve lived through the months of error messages that are discovered in code that needs to be updated with a new path. One of the first functions I created for our department was assigning the network drive locations for our common data sets. When the O drive is suddenly switched to the P drive and the drive hierarchy has a new departmental subdirectory, a quick update to the package results in scripts that continue to run without missing a beat. This is much faster than having to review every script to confirm that a file path is current or needs updating.

In this function (below), the network directory is returned when “P” is used and returns my local directory when any other character is used.

Function: datadrive

datadrive <- function(drive) {

datadrive.string <- ifelse(drive == “P”, return(“P:/Org/Department/MasterData/ “),

return(“C:/Temp/LocalData/”))

}

Use: datapath <- datadrive(“P”)

 

4. Corporate colours and aesthetic

“Here’s lookin at you kid .”

Having the hex codes (a six digit code the defines the red, green and blue combinations for each colour) for your organization’s colour scheme/brand available is incredibly handy for corporately aligned, aesthetically pleasing visualizations. The colour palettes I included in my organization’s package were built following a blog post by @drsimonj who walks through all steps required to build your own palettes: https://drsimonj.svbtle.com/creating-corporate-colour-palettes-for-ggplot2

Useful tools: Hex codes are used in R as a string with the prefix #.  For example, black is “#000000”, white is “#FFFFFF”.

Considering colour combinations: ColorBrewer2 is the go-to site for considering colour combinations for all kinds of data (sequential, diverging, qualitative): http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3

Finding a colour code: If you want to find the hex code for a colour, Adobe’s Color website is an excellent resource: https://color.adobe.com

Finding a tint or shade: If you would like to find shades and tints for a specific colour (useful for graphs) High Integrity Design has a useful online tool: https://highintegritydesign.com/tools/tinter-shader/

 

3: Standardizing your data extracts

“I’ll be back.”

Although dashboards are fantastic for monitoring systems, there is still a need for milestone reporting (i.e. a year-end report summarizing your key metrics). Having your RSQL bundled in a package streamlines the process and ensures that when you return to run the report the next month or year, the data structure will be the same.

Good: an .R file with code that extracts data from your information systems.

Better: turning the code into a function so that the process is easy to run.

Best: add the function to a package so you don’ t worry about where the source file is or which version is the most recent to use.

 

2: Keep your key metrics on hand at all times

“May the force be with you.”

If you have a set of key metrics it can be included as a table in your package.  This makes them portable and convenient to access when you are working offline.

 

1: Availability of definitions and keys

“Are you the Key Master?”

Working with data sets from several different information systems can result in many different naming conventions.  Including a master table in the package that aligns all the identifiers in the package is a big time saver and makes analysis much more straightforward.


  • Famous Movie Lines
    • “If you build it, they will come.” – Field of Dreams
    • “There’s no crying in baseball.” – A League of their own”
    • “There’s no place like home.” – Wizard of Oz
    • “Here’s lookin at you kid.” – Casablanca
    • “I’ll be back.” – Terminator
    • “May the Force be with you.” – Star Wars
    • “Are you the Key Master?” – Ghostbusters
Advertisements
Posted in R | Leave a comment

REQAO: An R package for EQAO data

Introducing REQAO, an R Package for those who work with EQAO data files. This package is a collection of functions to assist in the loading of files and the relabeling of values.  As additional functions are added and expanded, the most up-to-date version will be found at https://github.com/cconley/REQAO

What is EQAO?

EQAO (Education Quality and Accountability Office) is an independent agency that develops, administers, analyses and reports on student achievement across Ontario, Canada.  EQAO coordinates the administration of Reading, Writing and Mathematics assessments for students in Grades 3 and 6, Grade 9 Mathematics assessments for students enrolled in Grade 9 Academic and Applied Mathematics courses and an Ontario Secondary School Literacy Test (OSSLT) for students in Grade 10 English. EQAO also supports the administration of national and international assessments (TIMSS, PIRLS, PISA, PCAP, ICILS) in Ontario.

Each September, staff in school boards run analyses and produce public reports on the achievement of their students on these assessments.  This package supports the preparation of data for analysis.

This package currently has three functions:

SchoolLoad(grade, year, bident, board, datadir)

What it does: creates a tibble (previously known as a dataframe) of school summary achievement data on EQAO assessments.

Merge all the data files from an EQAO administration year into a single tibble.

Fill in the missing board name and bident (missing in the EQAO files) with the arguments that are passed to the function).

Create a new column with the year of the assessment. This makes it easier to work with multiple years of assessments.

Note: in EQAO files, the year as of June is used as the EQAO naming convention for each school-year.

How it is used:

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to construct the names of the data files.

year: this is a numeric value in the form YYYY. This is used to construct the names of the data files.

bident: this is the 5-digit number assigned by the Ministry of Education. This is used to construct the names of the data files.

board: this can be any string you choose in the form “ABCDEFG”. This is used to fill in the blank SchoolName value for the board summary row.

datadir: this is your data directory as a strong in the form “C:/directory/subdirectory”. This can be passed as a vector and is used to construct the string identifying the file and location.

Example:

SchoolTibble <- SchoolLoad(3, 2017, 12345, “Random DSB”, “C:/temp/data/”)

StudentLoad(grade, year, bident datadir)

What it does: creates a tibble of student achievement on EQAO assessments

Merge all the data files from an EQAO administration year into a single tibble.

Compile and relabel the IEP variables into a single, new, readable column (IEPcode).

Relabel values of demographic variables to a readable form (i.e. Gender, ELL, Eligibility, Program, FI etc. as applicable).

How it is used:

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to construct the names of the data files.

year: this is a numeric value in the form YYYY. This is used to construct the names of the data files.

bident: this is the 5-digit number assigned by the Ministry of Education. This is used to construct the names of the data files.

datadir: this is your data directory, as a string, in the form “C:/directory/subdirectory”. This can be passed as a vector and is used to construct the string identifying the file and location.

Example:

NewTbl <- StudentLoad(6, 2017, 12345, “C:/temp/data/”)

AchieveLabel(x, grade, type)

What it does: Modifies tibbles made using the StudentLoad() function:

Relabel values of all achievement variables contained in the assessment file (i.e. ROverallLevel, Prior_G6_MOverallLevel, OSSLTOutcome, etc.) as character labels (i.e. “Level 1”, “Level 3”, “Exempt”) or numeric labels (i.e. 0, 1, 2, 3, 4, NA). Note, when values are relabeled as numeric, all non-level values are relabeled as NA

How it is used: Arguments used in the function

x: the name of the tibble that was created using the StudentLoad() function.

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to identify naming conventions used by each version of the EQAO assessments.

type: this is one of two strings:

“char”: this will change the values to more meaningful, readable labels (i.e. “Level 1”, “Exempt”, “Pending”

“num”: this will change the values to numeric where Levels 0 to 4 are coded as 0, 1, 2, 3 and 4 and all other versions of no-data (withheld, pending, absent, exempt etc.) are relabeled as NA. This is equivalent to calculating “Fully Participating”.

Example:

NewStdntTbl <- AchieveLabel(OldStdntTbl, 6, “char”)

 

Installation and use

The REQAO package can be installed using the following function from the devtools package:

devtools::install_github(“cconley/REQAO“)

Once installed, REQAO can be used like any other package:

library(REQAO)

Posted in R | Leave a comment

The 9 Lives of a Photo (Pictures are Data)

With the availability of mobile devices in the classroom, photos are increasingly being used by educator teams to document learning. The blessing and the curse quickly becomes apparent as hundreds of pictures accumulate and clog albums and directories. The “perfect picture” just does not look meaningful a week or two later when the activity and student interaction are a fading memory. The longer you wait to review and annotate the photos, the less likely you will use them as part of your professional dialogue and assessment process.

Here are 9 ways one photo can enhance your professional practice.


When the picture is…

…sent home, it is parent communication

…reviewed by the teacher, it is assessment

…catalogued according to who is demonstrating learning, it is tracking and monitoring

…collected and considered over time, it is documentation of learning and growth

…posted on the bulletin board outside the classroom, it is making learning and milestones visible to the other students and teachers in the school

…posted in the classroom, it is making learning and milestones visible to the other students in the class

…shared with the student, it is student reflection

…shared on Pinterest, twitter or through email with colleagues, it is shared practice

…shared with the administrator, it is documentation of practice


An important motto that applies in many areas of life is:

“Just because you can doesn’t mean you should.”

This is an important mindset to have when using photographs to document learning.  Following are three considerations for every image you capture and plan to use:

  • Privacy
    Has the family agreed to have school photos shared? Have you asked the family if they would like to receive photos of their child in the class? Do you really need to show the child’s face in the photo or can you just show the work that is being done, perhaps with a hand or finger pointing to something of significance?  The PIM Toolkit (Privacy and Information Management Toolkit) has further direction: Full names of students and other personal information and/or photographs do not appear on work displayed in the school, on websites and/or in newsletters.” For more privacy details, you can check the toolkit out here: https://www.pimedu.org/files/toolkit/PIMtoolkit.pdf  Steps to maintaining privacy can come before other considerations and intentions.
  • Purposeful:
    Is it a “cute” picture or are you capturing a meaningful product, process or interaction? “Cute” makes a great family photo but in the classroom the focus is on purposeful documentation (and is within the privacy considerations discussed previously). Parents see their children every day, but they don’t get to see them engaged in the classroom.  This is your chance to invite them in to your classroom space.
  • Positive:
    With the privacy considerations addressed and the purpose of the photo determined, how would the child feel about their work being displayed or shared in some form? Even if a child cannot be identified in a photo, sharing or displaying a photo that a child feels anxious about undermines your classroom culture and relationship. Find another way to display this kind of learning (perhaps create a sample or mock version as an illustration).
Posted in Blog | Tagged , , , | Leave a comment

Finding the Path

When secondary students complete their courses and accumulate credits, they leave a trail of data behind them.  As these same students consider their course options for the the next year, they are making choices about the pathway they will follow.  These pathways become easier to see in the following visualization.

These pathways are easier to see with the following visualization that I adapted from the work of Kerry Rodden for an education data context.  This visualization was prepared for Karen Robson, Department of Sociology McMaster University, who will be presenting at the Canadian Sociological Association (part of the Congress of the Humanities and Social Sciences) on “Practical Advice on Communicating Sociological Research”, May 30th.  The  fictional cohort data was constructed and presented in a way that would allow the user to explore student pathways:

  • The inner most circle represents courses that have been completed in Grade 9.
  • Each larger circle describes the following year: grade 10, grade 11 and grade 12.
  • Each segment within a circle describes a course type (academic, applied, university, college, open, workplace etc.) and is labeled in the legend. “End” means the fictional student was no longer in the school at that point.
  • As your mouse hovers over each segment of a ring, the percentage of all students following that pathway is calculated in the centre.
  • A breadcrumb trail is created at the top, highlighting the pathway you have selected.
  • The larger the segment, the greater the number of fictional students that are represented.

Click on the image to play with the interactive version.

                       Exploring Student Pathways

It would be interesting to see what proportions emerge from real cohort data or how those proportions might differ according to student characteristics.

If one advances confidently in the direction of his dreams, and endeavors to live the life which he has imagined, he will meet with a success unexpected in common hours.

Henry David Thoreau

Posted in Data Visualization | Leave a comment

Making EQAO data easyR to work with

Academic data, just like every other data set, usually consumes more time with cleaning and reshaping than analyzing and visualizing. One of the appeals of R is the ability to re-use code and it is in that spirit that I’ve written the following function – to make my life (and hopefully the lives of a few other education researcher s) a little easier with basic re-coding tasks.

IEPs are a common category for grouping records and in EQAO records they reside in separate columns.  The following function works with any dataframe that contains all of the SIF columns (works with both Primary and Junior records): IEP.EQAO(dataframe)

The dataframe is returned with a new column that identifies, in plain language, the IEP that was assigned to each record.

IEP.EQAO <- function(x){
 x$IEP <- paste0(x$SIF_IEP, 
 x$SIF_IPRC_Behaviour,
 x$SIF_IPRC_Autism,
 x$SIF_IPRC_Deaf,
 x$SIF_IPRC_Language,
 x$SIF_IPRC_Speech,
 x$SIF_IPRC_Learning,
 x$SIF_IPRC_Giftedness, 
 x$SIF_IPRC_MildIntellectual,
 x$SIF_IPRC_Developmental,
 x$SIF_IPRC_Physical,
 x$SIF_IPRC_Blind,
 x$SIF_IPRC_Multiple)
 
x$IEP <- ifelse(x$IEP == "0000000000000", "No IEP",
          ifelse(x$IEP == "1000000000000", "IEP no IPRC",
           ifelse(x$IEP == "1100000000000", "Behaviour",
            ifelse(x$IEP == "1010000000000", "Autism",
             ifelse(x$IEP == "1001000000000", "Deaf",
              ifelse(x$IEP == "1000100000000", "Language",
               ifelse(x$IEP == "1000010000000", "Speech",
                ifelse(x$IEP == "1000001000000","Learning",
                 ifelse(x$IEP == "1000000100000","Giftedness",
                  ifelse(x$IEP == "1000000010000","MildIntellectual",
                   ifelse(x$IEP == "1000000001000","Developmental",
                    ifelse(x$IEP == "1000000000100","Physical",
                     ifelse(x$IEP == "1000000000010","Blind",
                      ifelse(x$IEP == "1000000000001","Multiple","BadCode")
                     )
                    )
                   )
                  )
                 )
                )
               )
              )
             )
            )
           )
          )
         )
 return(x) 
}

The code is also available on Github here and is the beginning of what I hope will collaboratively evolve into an EQAO Package.

Future development will include re-coding for Secondary data files. Any comments or interest in collaboration are always welcome.

*Update May 29: Code has been modified to work with any ISD file (3,6,9,10) going back to 2011.

Posted in R | Tagged , | 2 Comments

Find out….and teach it.

For any group interested in engaging in “Results Based Accountability” one of the first steps is to establish a common language.  The development of a common language is not to benefit people “on the inside”.  Instead, it is the first act of accountability that makes the work transparent to everyone “on the outside”.  No acronyms, no complex terms, no jargon.  Language that is easily understood by anyone you may be chatting with.

In a speech to the British Institute of Management in 1977, Kingman Brewster Jr (an educator, president of Yale University, and American diplomat) commented that “Incomprehensible jargon is the hallmark of a profession.”  There is no doubt that education is a profession but it left me wondering, what would “Edu speak” look like if it was re-written into straightforward, common, language?

  • Differentiated Learning: Find out what each student doesn’t know and teach it to them.
  • Gap Closing: Find out what groups of students don’t know (but would be expected to know) and teach it to them.
  • InterventionsTeaching students things they do not know (but need to know).
  • Inquiry based learning: Find out what a student(s) doesn’t know/wants to know and explore the answer alongside them.
  • Diagnostic Assessment: Find out what a student doesn’t know.  Use this information to know what to teach them.
  • Formative Assessment: Find out what a student still doesn’t know.  Use this information to know what to teach them.
  • Summative Assessment: Find out what a student knows.  Give this information to the next teacher so they know what to teach them.

The pattern is easy to see with these examples.  Have I made it too simple? There is the action of “finding out” and the act of arriving at new knowledge by “teaching them” (which includes exploration, inquiry etc.).

After years of academic study, practical experience and in-services, educators are quickly drawn to the second action of “teaching them”.  Hours are devoted to long range plans and lesson plans.  But who is better off if those plans do not relate to an actual student need? It is the “finding out” (often called assessment or evaluation) that takes time and, more importantly, determines what action is going to be most meaningful.

Lewis Carroll is often attributed with the statement “If you do not know where you are going, then any road will get you there.”  However, the exchange this misquotation is based on is more interesting:

“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where–” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
“–so long as I get SOMEWHERE,” Alice added as an explanation.
“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”

Teaching will get you somewhere.  There will be lots of hours in class, lots of plans written, lots of expectations of students and you will always arrive at a new year with a new class.  But what is needed by the students in the class right now? If you hear the statement “You should know this by now” you can be certain that the person saying it recognizes a need/a learning gap.  The only question left is whether action will follow.

Knowledge and action.  You shouldn’t have one without the other.

Knowledge without action is trivia.
Action without knowledge is busywork.

At the end of the day, it should be easy to see, easy to understand, and easy to explain in common language.

 

Posted in Assessment Literacy | Tagged , , | Leave a comment

Highlights from OERS16

The 11th annual Ontario Education Research Symposium was held from February 9th to February 11th with the theme “Networking & Partnerships: The Core of Achieving Excellence in Education. Over 500 people from networks, organizations and stakeholders across the Education sector participated in the event which featured:

  • 27 speakers,
  • 18 workshops,
  • 6 Mobilizing sessions,
  • 4 Provocative Speaker sessions and
  • 1 Fireside chat
  • 1 Spectacular student jazz band
  • Students as symposium attendees

Throughout the conference, participants were active on Twitter using #OERS16. As with previous years, I have compiled the tweets over the course of the Symposium using ‘s  TAGS 6.0 utility (click here to access the tweet archive).

In an attempt to make the tweets more useful and accessible, this year I used R to extract links that were shared (click here for more information on the process) and then created a series of pdf resources that compile the shared links:

Tweeting Trivia

At the time of this summary there were 2,031 tweets from 298 different people. Using the interactive viewer with TAGS 6.0, we can see what this kind of networking looks like:

OERS16_network

As you can see, the majority of tweets are isolated with only a few key people connecting and interacting through Twitter (largest names with the most connecting lines), though that isn’t to say that twitter hasn’t facilitated face-to-face interactions.

Over the course of the first day of the conference, I took a series of snapshots of a sentiment analysis (using an online utility “Sentiment Viz: Tweet Sentiment Visualization” developed at NC State University).  I created an animated gif to see how sentiment changed at five points in the day (morning, morning break, lunch, afternoon break, evening):

OERS animated sentiment

The left half of the oval (blue dots) represent tweets with “unpleasant” terms and the right half of the oval (green dots) represent tweets with “pleasant” terms.  Dots closer to the top of the oval represent tweets with “active” terms and dots closer to the bottom of the oval represent tweets with “subdued” terms.

Throughout the entire day, the tweets were very positive and those that are more negative on the sentiment analysis were tweets relaying the challenges that many of the speakers were addressing through their work.  (Note: in the interactive version you can highlight a dot and see the underlying tweet with the terms highlighted that have been coded as part of the sentiment analysis)

Top Tweeters

This year, the top 10 tweeters from #OERS16 were:

  • @DrKatinaPollock     (182)
  • @CarolCampbell4      (157)
  • @avanbarn                   (151)
  • @ResearchChat          (109)
  • @naturallycaren         (101)
  • @Jan__Murphy          (98)
  • @GregRousell               (75)
  • @KNAER_RECRAE   (66)
  • @OISENews                 (50)
  • @HeidiSiwak                (41)

However, of those 2,031 tweets, 47% were retweets (tweets that begin with RT) leaving 1,102 original tweets. Considered from the perspective of original tweets vs. retweets, the top tweeters begin to look very different:

OERS16_tweet-retweet

This adjustment highlights two different but important approaches to the use of social media.  On the one hand, @avanbarn’s generation of so much “original content” is an example of using social media for note-taking (paraphrasing presenters, highlighting speaking points, sharing links to referenced material, sharing reflections and questions inspired by a presenter).  On the other hand, @DrKatinaPollock and @CarolCampbell4’s high level of retweets are examples of cross-network dissemination. As these two approaches work in tandem, the key messages of the symposium presenters reach far beyond the room of attendees and broadens opportunities for discussion and additional inquiry.

Adjusted for the percentage of original tweets, the top ten tweeters now becomes:

  • @avanbarn                   147 (97% )
  • @ResearchChat           106 (97%)
  • @GregRousell                63 (84%)
  • @KNAER_RECRAE     48 (73%)
  • @Jan__Murphy            65 (66%)
  • @HeidiSiwak                  25 (61%)
  • @naturallycaren            61 (60%)
  • @DrKatinaPollock        65 (36%)
  • @CarolCampbell4        54 (34%)
  • @OISENews                    11 (22%)

@GregRousell has also been archiving tweets from #OERS16 using the R package twitteR. In an upcoming post on the Data User Group, we will be sharing a detailed overview for each of our approaches and share the benefits and challenges of each approach.

Posted in Data Visualization, Twitter | Tagged , | Leave a comment