The Poetic Works of Idle Hands (2022)

Over March Break (2022), in a small corner of Twitter, statistical concepts, philosophies of data collection, and poor attempts at mathematical humor were shared.  Following is a compilation of the verses that were drafted….which ultimately drifted, as statistical conversations ought, to musings about Wordle.

Many thanks to those who joined in the fun, and for those who saw it as foolishness, let me take a more formal evaluation of the exercise.  To reduce a concept to a few rhyming syllables is a concrete application of creativity and synthesis.

Notes:
– Poems without titles have been labeled according to their author.
– Any omissions are entirely unintentional, please let me know if you spot one that hasn’t been included.
– It is possible that grammar, rhyming schemes and poetic sensibilities may have been injured in the attempt to maintain conceptual fidelity. Proceed at your own discretion.


A statistical poem appreciating the poetry of statistics
By @ResearchChat

The measures of all central tendency
Shrinks data down so you then can see
Where it congeals
But sadly conceals
The rest of the data’s resplendency

#NotForEveryone

Numeric Prophe-sigh’ing
By @ResearchChat

Use your discretion
with logistic regression
when foretelling the news
with just ones and twos

(or more technically accurate)

Use your discretion
With logistic regression
When foretelling is done
With just zero and one

#StillNotForEveryone

On Matters Pertaining to the Student’s T
By @ResearchChat

If you want clarity
On sample disparity
Don’t call your bestie,
Just run a Test-T

#NotMyBest

By C. Anderson:

A young researcher from Moldova
Had to run her analyses ova
Her results didn’t thrive
(P greater than point-oh-5)
Next time, she’ll run an ANOVA

Stat Burglar
By @ResearchChat

Of all the statistical labours
There’s one that will land you in papers
There’s no use disputin’
Cuz when you’re imputin’
You’re stealing numerically from neighbours

Weighing Both Sides
(alternate title: Lunch Time Ponderings)
By @ResearchChat

Increasing your weight
In a model is great
It makes the data fit better

But weight on your waist
From indulging your taste
Just makes for a tight fitting sweater

Doing More…..With less
By @ResearchChat

Models that are parsimonious
Are balanced, lean, and harmonious
Adding more variables
Is more than just terrible
It’s unneeded, wrong, and erroneous

#OddWayToStartMarchBreak
#YetAnotherLimerick

Humor <
By @ResearchChat

X bar is the mean, and
X-tilde is the median.
“X walks into a bar jokes”
Won’t make you a comedian.

#HappyPiDay

Not so long ago, in a model not so far away…
By @ResearchChat

There is a sequel
To slopes that are equal
It’s slopes that are random
But it has a small fandom.

From Dr. Robson

Its close cousin
Has got me buzzin

If suitably prepped
Randomize the intercept!

From Dr. Robson

Data are useful
And should be respected
But if you live in Canada
They won’t be collected.

Moody Thoughts
By @ResearchChat

The null hypothesis is happy to say
“There’s nothing to see here, keep moving”
But get a result that’s significant
And suddenly the null is disproving

From Dr. Robson

The null hypothesis
Is kinda bad
Cuz if the null is correct
The alternative is sad

From Dr. Robson

The research ethics board
Fills me with so much rage
But they won’t release my money
Til I fill out all the page

From Dr. Robson

Stata is best
To do your analysis
Because a missing period
Causes SPSS paralysis

If you use R
You’re probably cheap
The environment is hostile
And makes me weep

I don’t know why
People use SAS
All those semicolons
Are incredibly crass

From SAS Software
By Dr. Robson

The Semicolon indicates to the compiler it has received its command
If not inserted correctly, it simply would not understand 🤷‍♀️
Just like the Irish, may still follow the gold 💰
We believe #SASusers will continue to code and not fold.

#NoResponseFromStata

Methodative Quantologies
By @ResearchChat

When working with numbers
you’ll be called a “Quant”
Here are considerations, advisable:

If there is anything in
your analysis to flaunt
It should be reliable, valid, generalizable

Qualitatricity
By @ResearchChat

When working with narratives,
you’ll get the name “Qual”.
Collecting the truths people speak.

Highlighting their lives,
And looking through all,
For themes that are common or unique

#JustWhenYouThoughtItWasOver

Statistical Mixologists
By @ResearchChat

When blending these methods
You’ll be known as “Mixed”
And be the star extra-curricular

Of all of these methods,
Using the strongest betwixt:
Quan in general and Qual in particular

#MethodologicalVerse

From Dr. Robson

If the assumption
Of your approach
Is objective reality

Then the way you report
Should exhibit
Moral neutrality

Mixed methods: a poem
By Dr. Robson

Everyone will say
Mixed is the best
Til the interview data
Contradict the t-test.

From Dr. Cathlene Hillier

Mixed methods is the way to go.
Until the journal sees word count and says “no”!

Disturbing the Labour – a poem
By Dr. Robson

Everyone’s mad
Professors going on strike
But you gotta remember
It’s not something they like

Instead of aligning
with administrative positions
Remember that learning
Depends on working conditions

A Significant poem
By Dr. Robson

Report the p values
Show me those stars
Unless you’re in health
And love CI bars
Economists though
will be a damn terror
With their insistence on just
The raw standard error

Here’s a point (Oh Five)
By @ResearchChat

No wonder statisticians are gloomy
And so frequently prone to rant
Wouldn’t you feel the same, if your best results
Were declared as signifi-CANT ?

First p-value of the morning
By @ResearchChat

I once read in a book
That you can tell with just a look
Whether you have statistical significance

If, when you are graphin’
Your CI bars AREN’T overlappin’
then your p-value’s what we technically call “terrific-ance”

Perpetual Movement Statistic
By @ResearchChat

Degrees of Freedom
Why do we need’em?
The burden they carry:
How many values can vary.

Wordle: A Poem
By Dr. Robson

Wordle is my current
Source of fascination

Makes me feel smart
But it’s just procrastination

By ResearchChat

Have you tried Quordle ?
You solve four at once.
Wordle made me feel smart
Quordle made me a dunce.

Pah Pah OO Mow Mow
By @ResearchChat

Have you heardle about the Birdle?
Well Everybody knows that the Birdle’s the Wordle

Advertisement
Posted in Blog | Leave a comment

If you build it, they will come: 5 Blockbuster reasons your organization should have an R package

“There’s no crying in baseball “

There is a big learning curve to R.
It might make you cry.
It will probably make you angry.
It will definitely make you frustrated. And it is all worth it because:

  • It documents your work which supports transparency and accountability. This is an important step to building trust both inside and outside an organization.
  • It supports iterative and collaborative development. You can walk away from your work and pick it back up to continue or improve your work quickly.
  • It increases efficiency and timeliness of reporting/analysis. Once developed, R code can reduce the time for analysis and reporting from days and weeks to minutes and hours. In one project, the development of my code took weeks (same amount of time as it would have taken with traditional workflows) but in the next reporting period the time to produce those same updated reports (over 300 pdf’s) was just over 1 hour. That’s more time to devote to other important projects.

As summer arrives and the summer blockbuster movies become released, here are 5 blockbuster reasons to get started with R and consider creating an R package for your organization:

5: Managing file locations

“There’s no place like home.”

Having your network drive remapped is a pain in the neck. I’ve lived through the months of error messages that are discovered in code that needs to be updated with a new path. One of the first functions I created for our department was assigning the network drive locations for our common data sets. When the O drive is suddenly switched to the P drive and the drive hierarchy has a new departmental subdirectory, a quick update to the package results in scripts that continue to run without missing a beat. This is much faster than having to review every script to confirm that a file path is current or needs updating.

In this function (below), the network directory is returned when “P” is used and returns my local directory when any other character is used.

Function: datadrive

datadrive <- function(drive) {

datadrive.string <- ifelse(drive == “P”, return(“P:/Org/Department/MasterData/ “),

return(“C:/Temp/LocalData/”))

}

Use: datapath <- datadrive(“P”)

 

4. Corporate colours and aesthetic

“Here’s lookin at you kid .”

Having the hex codes (a six digit code the defines the red, green and blue combinations for each colour) for your organization’s colour scheme/brand available is incredibly handy for corporately aligned, aesthetically pleasing visualizations. The colour palettes I included in my organization’s package were built following a blog post by @drsimonj who walks through all steps required to build your own palettes: https://drsimonj.svbtle.com/creating-corporate-colour-palettes-for-ggplot2

Useful tools: Hex codes are used in R as a string with the prefix #.  For example, black is “#000000”, white is “#FFFFFF”.

Considering colour combinations: ColorBrewer2 is the go-to site for considering colour combinations for all kinds of data (sequential, diverging, qualitative): http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3

Finding a colour code: If you want to find the hex code for a colour, Adobe’s Color website is an excellent resource: https://color.adobe.com

Finding a tint or shade: If you would like to find shades and tints for a specific colour (useful for graphs) High Integrity Design has a useful online tool: https://highintegritydesign.com/tools/tinter-shader/

 

3: Standardizing your data extracts

“I’ll be back.”

Although dashboards are fantastic for monitoring systems, there is still a need for milestone reporting (i.e. a year-end report summarizing your key metrics). Having your RSQL bundled in a package streamlines the process and ensures that when you return to run the report the next month or year, the data structure will be the same.

Good: an .R file with code that extracts data from your information systems.

Better: turning the code into a function so that the process is easy to run.

Best: add the function to a package so you don’ t worry about where the source file is or which version is the most recent to use.

 

2: Keep your key metrics on hand at all times

“May the force be with you.”

If you have a set of key metrics it can be included as a table in your package.  This makes them portable and convenient to access when you are working offline.

 

1: Availability of definitions and keys

“Are you the Key Master?”

Working with data sets from several different information systems can result in many different naming conventions.  Including a master table in the package that aligns all the identifiers in the package is a big time saver and makes analysis much more straightforward.


  • Famous Movie Lines
    • “If you build it, they will come.” – Field of Dreams
    • “There’s no crying in baseball.” – A League of their own”
    • “There’s no place like home.” – Wizard of Oz
    • “Here’s lookin at you kid.” – Casablanca
    • “I’ll be back.” – Terminator
    • “May the Force be with you.” – Star Wars
    • “Are you the Key Master?” – Ghostbusters
Posted in R | Leave a comment

REQAO: An R package for EQAO data

Introducing REQAO, an R Package for those who work with EQAO data files. This package is a collection of functions to assist in the loading of files and the relabeling of values.  As additional functions are added and expanded, the most up-to-date version will be found at https://github.com/cconley/REQAO

What is EQAO?

EQAO (Education Quality and Accountability Office) is an independent agency that develops, administers, analyses and reports on student achievement across Ontario, Canada.  EQAO coordinates the administration of Reading, Writing and Mathematics assessments for students in Grades 3 and 6, Grade 9 Mathematics assessments for students enrolled in Grade 9 Academic and Applied Mathematics courses and an Ontario Secondary School Literacy Test (OSSLT) for students in Grade 10 English. EQAO also supports the administration of national and international assessments (TIMSS, PIRLS, PISA, PCAP, ICILS) in Ontario.

Each September, staff in school boards run analyses and produce public reports on the achievement of their students on these assessments.  This package supports the preparation of data for analysis.

This package currently has three functions:

SchoolLoad(grade, year, bident, board, datadir)

What it does: creates a tibble (previously known as a dataframe) of school summary achievement data on EQAO assessments.

Merge all the data files from an EQAO administration year into a single tibble.

Fill in the missing board name and bident (missing in the EQAO files) with the arguments that are passed to the function).

Create a new column with the year of the assessment. This makes it easier to work with multiple years of assessments.

Note: in EQAO files, the year as of June is used as the EQAO naming convention for each school-year.

How it is used:

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to construct the names of the data files.

year: this is a numeric value in the form YYYY. This is used to construct the names of the data files.

bident: this is the 5-digit number assigned by the Ministry of Education. This is used to construct the names of the data files.

board: this can be any string you choose in the form “ABCDEFG”. This is used to fill in the blank SchoolName value for the board summary row.

datadir: this is your data directory as a strong in the form “C:/directory/subdirectory”. This can be passed as a vector and is used to construct the string identifying the file and location.

Example:

SchoolTibble <- SchoolLoad(3, 2017, 12345, “Random DSB”, “C:/temp/data/”)

StudentLoad(grade, year, bident datadir)

What it does: creates a tibble of student achievement on EQAO assessments

Merge all the data files from an EQAO administration year into a single tibble.

Compile and relabel the IEP variables into a single, new, readable column (IEPcode).

Relabel values of demographic variables to a readable form (i.e. Gender, ELL, Eligibility, Program, FI etc. as applicable).

How it is used:

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to construct the names of the data files.

year: this is a numeric value in the form YYYY. This is used to construct the names of the data files.

bident: this is the 5-digit number assigned by the Ministry of Education. This is used to construct the names of the data files.

datadir: this is your data directory, as a string, in the form “C:/directory/subdirectory”. This can be passed as a vector and is used to construct the string identifying the file and location.

Example:

NewTbl <- StudentLoad(6, 2017, 12345, “C:/temp/data/”)

AchieveLabel(x, grade, type)

What it does: Modifies tibbles made using the StudentLoad() function:

Relabel values of all achievement variables contained in the assessment file (i.e. ROverallLevel, Prior_G6_MOverallLevel, OSSLTOutcome, etc.) as character labels (i.e. “Level 1”, “Level 3”, “Exempt”) or numeric labels (i.e. 0, 1, 2, 3, 4, NA). Note, when values are relabeled as numeric, all non-level values are relabeled as NA

How it is used: Arguments used in the function

x: the name of the tibble that was created using the StudentLoad() function.

grade: this is a numeric value that can be 3, 6, 9 or 10. This is used to identify naming conventions used by each version of the EQAO assessments.

type: this is one of two strings:

“char”: this will change the values to more meaningful, readable labels (i.e. “Level 1”, “Exempt”, “Pending”

“num”: this will change the values to numeric where Levels 0 to 4 are coded as 0, 1, 2, 3 and 4 and all other versions of no-data (withheld, pending, absent, exempt etc.) are relabeled as NA. This is equivalent to calculating “Fully Participating”.

Example:

NewStdntTbl <- AchieveLabel(OldStdntTbl, 6, “char”)

 

Installation and use

The REQAO package can be installed using the following function from the devtools package:

devtools::install_github(“cconley/REQAO“)

Once installed, REQAO can be used like any other package:

library(REQAO)

Posted in R | Leave a comment

The 9 Lives of a Photo (Pictures are Data)

With the availability of mobile devices in the classroom, photos are increasingly being used by educator teams to document learning. The blessing and the curse quickly becomes apparent as hundreds of pictures accumulate and clog albums and directories. The “perfect picture” just does not look meaningful a week or two later when the activity and student interaction are a fading memory. The longer you wait to review and annotate the photos, the less likely you will use them as part of your professional dialogue and assessment process.

Here are 9 ways one photo can enhance your professional practice.


When the picture is…

…sent home, it is parent communication

…reviewed by the teacher, it is assessment

…catalogued according to who is demonstrating learning, it is tracking and monitoring

…collected and considered over time, it is documentation of learning and growth

…posted on the bulletin board outside the classroom, it is making learning and milestones visible to the other students and teachers in the school

…posted in the classroom, it is making learning and milestones visible to the other students in the class

…shared with the student, it is student reflection

…shared on Pinterest, twitter or through email with colleagues, it is shared practice

…shared with the administrator, it is documentation of practice


An important motto that applies in many areas of life is:

“Just because you can doesn’t mean you should.”

This is an important mindset to have when using photographs to document learning.  Following are three considerations for every image you capture and plan to use:

  • Privacy
    Has the family agreed to have school photos shared? Have you asked the family if they would like to receive photos of their child in the class? Do you really need to show the child’s face in the photo or can you just show the work that is being done, perhaps with a hand or finger pointing to something of significance?  The PIM Toolkit (Privacy and Information Management Toolkit) has further direction: Full names of students and other personal information and/or photographs do not appear on work displayed in the school, on websites and/or in newsletters.” For more privacy details, you can check the toolkit out here: https://www.pimedu.org/files/toolkit/PIMtoolkit.pdf  Steps to maintaining privacy can come before other considerations and intentions.
  • Purposeful:
    Is it a “cute” picture or are you capturing a meaningful product, process or interaction? “Cute” makes a great family photo but in the classroom the focus is on purposeful documentation (and is within the privacy considerations discussed previously). Parents see their children every day, but they don’t get to see them engaged in the classroom.  This is your chance to invite them in to your classroom space.
  • Positive:
    With the privacy considerations addressed and the purpose of the photo determined, how would the child feel about their work being displayed or shared in some form? Even if a child cannot be identified in a photo, sharing or displaying a photo that a child feels anxious about undermines your classroom culture and relationship. Find another way to display this kind of learning (perhaps create a sample or mock version as an illustration).
Posted in Blog | Tagged , , , | Leave a comment

Finding the Path

When secondary students complete their courses and accumulate credits, they leave a trail of data behind them.  As these same students consider their course options for the the next year, they are making choices about the pathway they will follow.  These pathways become easier to see in the following visualization.

These pathways are easier to see with the following visualization that I adapted from the work of Kerry Rodden for an education data context.  This visualization was prepared for Karen Robson, Department of Sociology McMaster University, who will be presenting at the Canadian Sociological Association (part of the Congress of the Humanities and Social Sciences) on “Practical Advice on Communicating Sociological Research”, May 30th.  The  fictional cohort data was constructed and presented in a way that would allow the user to explore student pathways:

  • The inner most circle represents courses that have been completed in Grade 9.
  • Each larger circle describes the following year: grade 10, grade 11 and grade 12.
  • Each segment within a circle describes a course type (academic, applied, university, college, open, workplace etc.) and is labeled in the legend. “End” means the fictional student was no longer in the school at that point.
  • As your mouse hovers over each segment of a ring, the percentage of all students following that pathway is calculated in the centre.
  • A breadcrumb trail is created at the top, highlighting the pathway you have selected.
  • The larger the segment, the greater the number of fictional students that are represented.

Click on the image to play with the interactive version.

                       Exploring Student Pathways

It would be interesting to see what proportions emerge from real cohort data or how those proportions might differ according to student characteristics.

If one advances confidently in the direction of his dreams, and endeavors to live the life which he has imagined, he will meet with a success unexpected in common hours.

Henry David Thoreau

Posted in Data Visualization | Leave a comment