If you build it, they will come: 5 Blockbuster reasons your organization should have an R package

“There’s no crying in baseball “

There is a big learning curve to R.
It might make you cry.
It will probably make you angry.
It will definitely make you frustrated. And it is all worth it because:

  • It documents your work which supports transparency and accountability. This is an important step to building trust both inside and outside an organization.
  • It supports iterative and collaborative development. You can walk away from your work and pick it back up to continue or improve your work quickly.
  • It increases efficiency and timeliness of reporting/analysis. Once developed, R code can reduce the time for analysis and reporting from days and weeks to minutes and hours. In one project, the development of my code took weeks (same amount of time as it would have taken with traditional workflows) but in the next reporting period the time to produce those same updated reports (over 300 pdf’s) was just over 1 hour. That’s more time to devote to other important projects.

As summer arrives and the summer blockbuster movies become released, here are 5 blockbuster reasons to get started with R and consider creating an R package for your organization:

5: Managing file locations

“There’s no place like home.”

Having your network drive remapped is a pain in the neck. I’ve lived through the months of error messages that are discovered in code that needs to be updated with a new path. One of the first functions I created for our department was assigning the network drive locations for our common data sets. When the O drive is suddenly switched to the P drive and the drive hierarchy has a new departmental subdirectory, a quick update to the package results in scripts that continue to run without missing a beat. This is much faster than having to review every script to confirm that a file path is current or needs updating.

In this function (below), the network directory is returned when “P” is used and returns my local directory when any other character is used.

Function: datadrive

datadrive <- function(drive) {

datadrive.string <- ifelse(drive == “P”, return(“P:/Org/Department/MasterData/ “),

return(“C:/Temp/LocalData/”))

}

Use: datapath <- datadrive(“P”)

 

4. Corporate colours and aesthetic

“Here’s lookin at you kid .”

Having the hex codes (a six digit code the defines the red, green and blue combinations for each colour) for your organization’s colour scheme/brand available is incredibly handy for corporately aligned, aesthetically pleasing visualizations. The colour palettes I included in my organization’s package were built following a blog post by @drsimonj who walks through all steps required to build your own palettes: https://drsimonj.svbtle.com/creating-corporate-colour-palettes-for-ggplot2

Useful tools: Hex codes are used in R as a string with the prefix #.  For example, black is “#000000”, white is “#FFFFFF”.

Considering colour combinations: ColorBrewer2 is the go-to site for considering colour combinations for all kinds of data (sequential, diverging, qualitative): http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3

Finding a colour code: If you want to find the hex code for a colour, Adobe’s Color website is an excellent resource: https://color.adobe.com

Finding a tint or shade: If you would like to find shades and tints for a specific colour (useful for graphs) High Integrity Design has a useful online tool: https://highintegritydesign.com/tools/tinter-shader/

 

3: Standardizing your data extracts

“I’ll be back.”

Although dashboards are fantastic for monitoring systems, there is still a need for milestone reporting (i.e. a year-end report summarizing your key metrics). Having your RSQL bundled in a package streamlines the process and ensures that when you return to run the report the next month or year, the data structure will be the same.

Good: an .R file with code that extracts data from your information systems.

Better: turning the code into a function so that the process is easy to run.

Best: add the function to a package so you don’ t worry about where the source file is or which version is the most recent to use.

 

2: Keep your key metrics on hand at all times

“May the force be with you.”

If you have a set of key metrics it can be included as a table in your package.  This makes them portable and convenient to access when you are working offline.

 

1: Availability of definitions and keys

“Are you the Key Master?”

Working with data sets from several different information systems can result in many different naming conventions.  Including a master table in the package that aligns all the identifiers in the package is a big time saver and makes analysis much more straightforward.


  • Famous Movie Lines
    • “If you build it, they will come.” – Field of Dreams
    • “There’s no crying in baseball.” – A League of their own”
    • “There’s no place like home.” – Wizard of Oz
    • “Here’s lookin at you kid.” – Casablanca
    • “I’ll be back.” – Terminator
    • “May the Force be with you.” – Star Wars
    • “Are you the Key Master?” – Ghostbusters
This entry was posted in R. Bookmark the permalink.

Leave a comment