The metagear package for R contains tools for facilitating systematic reviews, data extraction, and meta-analyses. It aims to facilitate research synthesis as a whole, by providing a single source for several of the common tasks involved in screening studies, extracting outcomes from studies, and performing statistical analyses on these outcomes using meta-analysis. Below are a few illustrative examples of applications of these functionalities.
Updates to these examples will be posted on our research webpage at USF, and for previous vignette versions see v. 0.4, v. 0.3, v. 0.2 and v. 0.1.
For the source code of metagear see: http://cran.r-project.org/web/packages/metagear/index.html.
Funding for metagear is supported by National Science Foundation (NSF) grants DBI-1262545 and DEB-1451031.
I also thank J. Richardson, J. Zydek, N. Ogburn, B. MacNeill, J. Zloty, and my colleagues in the OpenMEE software team, J. Gurevitch and B. Wallace, for persuading me to develop tools in R.
Lajeunesse, M.J. (2016) Facilitating systematic reviews, data extraction and meta-analysis with the metagear package for R. Methods in Ecology and Evolution 7: 323-330. article link
Metagear has several external dependencies that need to be installed and loaded prior to use in R. The first is the EBImage R package (Pau et al. 2010) available only from Bioconductor repository.
To properly install metagear, use the following script in R:
# first load Bioconductor resources needed to install the EBImage package
# and accept/download all of its dependencies
install.packages("BiocManager");
BiocManager::install("EBImage")
# then load metagear
library(metagear)
Finally for Mac OS users, installation is sometimes not straighforward, as the abstract_screener()
requires the Tcl/Tk GUI toolkit to be installed. You can get this toolkit by making sure that the latest X11 application (xQuartz) is installed from here: xquartz.macosforge.org.
Please email me any bugs, comments, or suggestions and I’ll try to include them in future releases: lajeunesse@usf.edu. Also try to include metagear in the subject heading of your email. Finally, I’m open to almost anything, but expect a lag before I respond and/or new additions are added.
One of the first tasks of a systematic review is to screen the titles and abstracts of study references to assess their relevance for the synthesis project. For example, after a bibliographic search using Web of Science, there may be thousands of references generated; references from experimental studies, modeling studies, review papers, commentaries, etc. These need to be reviewed individually as a first pass to exclude those that do not fit the synthesis project; such as excluding simulation studies that do not report experimental outcomes useful for estimating an effect size.
However, individually screening thousands of references is time consuming, and large synthesis projects may benefit from delegating this screening effort to a research team. Having multiple people screen references also provides an opportunity to assess the repeatability of these screening decisions.
In this example, we have the following goals:
First, let’s start by loading and exploring the contents of a pre-packaged dataset from metagear that contains the bibliographic information of 11 journal articles (example_references_metagear
). These data are a subset of references generated from a search in Web of Science for “Genome size”, and contain the abstracts, titles, volume, page numbers, and authors of these references.
# load package
library(metagear)
## metagear 0.6
## ** For information on installing/troubleshooting metagear, see:
## ** http://lajeunesse.myweb.usf.edu/metagear/metagear_basic_vignette.html
##
## ** metagear system check: current setup supports GUIs [ TRUE ]
# load a bibliographic dataset with the authors, titles, and abstracts of multiple study references
data(example_references_metagear)
# display the bibliographic variables in this dataset
names(example_references_metagear)
## [1] "AUTHORS" "YEAR" "TITLE" "JOURNAL" "VOLUME" "LPAGES" "UPAGES" "DOI" "ABSTRACT"
# display the various Journals that these references were published in
example_references_metagear["JOURNAL"]
## JOURNAL
## 1 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
## 2 EVOLUTIONARY ECOLOGY RESEARCH
## 3 AMERICAN NATURALIST
## 4 GENE
## 5 VIRUS GENES
## 6 JOURNAL OF SHELLFISH RESEARCH
## 7 JOURNAL OF GENERAL MICROBIOLOGY
## 8 APPLIED GEOCHEMISTRY
## 9 JOURNAL OF DRUG DELIVERY SCIENCE AND TECHNOLOGY
## 10 BIOLOGIA PLANTARUM
## 11 GENOMICS
Our next step is to initialize/prime this dataset for screening tasks. Our goal is to distribute screening efforts to two screeners/reviewers: “Christina” and “Luc”. Here each reviewer will screen a separate subset of these references (a forthcoming example will review how to set up a dual screening design where each member screens the same references). The dataset first needs to be initialized as follows:
# prime the study-reference dataset
theRefs <- effort_initialize(example_references_metagear)
# display the new columns added by effort_initialize
names(theRefs)
## [1] "STUDY_ID" "REVIEWERS" "INCLUDE" "AUTHORS" "YEAR" "TITLE" "JOURNAL" "VOLUME" "LPAGES" "UPAGES" "DOI" "ABSTRACT"
Note that the effort_initialize()
function added three new columns: “STUDY_ID” which is a unique number for each reference (e.g., from 1 to 11), “REVIEWERS” an empty column with NAs that will be later populated with our reviewers (e.g., Christina and Luc), and finally the “INCLUDE” column, which will later contain the screening efforts by the two reviewers.
Screening efforts are essentially how individual study references get coded for inclusion in the synthesis project; currently the “INCLUDE” column has each reference coded as “not vetted”, indicating that each reference has yet to be screened.
Our next task is to delegate screening efforts to our two reviewers Christina and Luc. Our goal is to randomly distribute these references to each reviewer.
# randomly distribute screening effort to a team
theTeam <- c("Christina", "Luc")
theRefs_unscreened <- effort_distribute(theRefs, reviewers = theTeam)
# display screening tasks
theRefs_unscreened[c("STUDY_ID", "REVIEWERS")]
## STUDY_ID REVIEWERS
## 1 1 Christina
## 2 2 Luc
## 3 3 Christina
## 4 4 Luc
## 5 5 Christina
## 6 6 Luc
## 7 7 Luc
## 8 8 Christina
## 9 9 Christina
## 10 10 Christina
## 11 11 Luc
The screening efforts can also be delegated unevenly, such as below where Luc will take on 80% of the screening effort:
# randomly distribute screening effort to a team, but with Luc handeling 80% of the work
theRefs_unscreened <- effort_distribute(theRefs, reviewers = theTeam, effort = c(20, 80))
theRefs_unscreened[c("STUDY_ID", "REVIEWERS")]
## STUDY_ID REVIEWERS
## 1 1 Luc
## 2 2 Luc
## 3 3 Christina
## 4 4 Luc
## 5 5 Luc
## 6 6 Luc
## 7 7 Luc
## 8 8 Luc
## 9 9 Luc
## 10 10 Christina
## 11 11 Luc
The effort can also be redistributed with the effort_redistribute()
function. In the above example we assigned Luc 80% of the work. Now let’s redistribute half of Luc’s work to a new team member “Patsy”.
theRefs_Patsy <- effort_redistribute(theRefs_unscreened,
reviewer = "Luc",
remove_effort = 50, # move 50% of Luc's work to Patsy
reviewers = c("Luc", "Patsy")) # team members loosing and picking up work
theRefs_Patsy[c("STUDY_ID", "REVIEWERS")]
## STUDY_ID REVIEWERS
## 3 3 Christina
## 10 10 Christina
## 1 1 Luc
## 2 2 Luc
## 4 4 Luc
## 5 5 Patsy
## 6 6 Patsy
## 7 7 Patsy
## 8 8 Luc
## 9 9 Patsy
## 11 11 Luc
The references have now been randomly assigned to either Christina or Luc. The whole initialization of the reference dataset with effort_initialize()
can be abbreviated with effort_distribute(example_references_metagear, reviewers = c("Christina", "Luc"), initialize = TRUE)
.
Now that screening tasks have been distributed, the next stage is for reviewers to start the manual screening of each assigned reference. This is perhaps best done by providing a separate file of these references to Christina and Luc. They can then work on screening these references separately and remotely. Once the screening is complete, we can then merge these files into a complete dataset (we’ll get to this later).
The effort_distribute()
function can also save to file each reference subset; these can be given to Christina and Luc to start their work. This is done by setting the ‘save_split’ parameter to TRUE.
# randomly distribute screening effort to a team, but with Luc handling 80% of the work,
# but also saving these screening tasks to separate files for each team member
theRefs_unscreened <- effort_distribute(theRefs, reviewers = theTeam, effort = c(20, 80), save_split = TRUE)
## 2 files saved in: C:/Users/lajeunesse/Desktop/R_projects/metagear_0.6/metagear/vignettes
theRefs_unscreened[c("STUDY_ID", "REVIEWERS")]
## STUDY_ID REVIEWERS
## 1 1 Luc
## 2 2 Luc
## 3 3 Luc
## 4 4 Luc
## 5 5 Christina
## 6 6 Christina
## 7 7 Luc
## 8 8 Luc
## 9 9 Luc
## 10 10 Luc
## 11 11 Luc
list.files(pattern = "effort")
## [1] "effort_Christina.csv" "effort_Luc.csv"
These two effort_*.csv files contain the assigned references for Christina and Luc. These can be passed on to each team member so that they can begin screening/coding each reference for inclusion in the synthesis project.
References should be coded as “YES” or “NO” for inclusion, but can also be coded as “MAYBE” if bibliographic information is missing or there is inadequate information to make a proper assessment of the study.
The abstract_screener()
function can be used to facilitate this screening process (an example is forthcoming), but for the sake of introducing how screening efforts can be merged and summarized, I manually coded all the references in both of Christina’s and Luc’s effort_*.csv files. Essentially, I randomly coded each references as either “YES”, “NO”, or “MAYBE”. These files now contain the completed screening efforts.
We can merge these two files with the completed screening efforts using the effort_merge()
function, as well as summarize the outcome of screening tasks using the effort_summary()
function.
# merge the effort_Luc.csv and effort_Christina.csv
# WARNING: will merge all files named "effort_*" in directory
theRefs_screened <- effort_merge()
theRefs_screened[c("STUDY_ID", "REVIEWERS", "INCLUDE")]
## STUDY_ID REVIEWERS INCLUDE
## 1 5 Christina YES
## 2 6 Christina MAYBE
## 3 1 Luc YES
## 4 2 Luc YES
## 5 3 Luc YES
## 6 4 Luc YES
## 7 7 Luc MAYBE
## 8 8 Luc YES
## 9 9 Luc NO
## 10 10 Luc NO
## 11 11 Luc NO
theSummary <- effort_summary(theRefs_screened)
## === SCREENING EFFORT SUMMARY ===
##
## 3 candidate studies identified
## 6 studies excluded
## 2 challenging studies needing additional screening
## ----
## 11 TOTAL SCREENED
##
## === SCREENING DESIGN SUMMARY ===
##
## MAYBE YES NO TOTAL %
## Christina 1 1 0 2 18.18182
## Luc 1 5 3 9 81.81818
## TOTAL 2 6 3 11 100.00000
The summary of screening tasks describes the outcomes of which references had studies appropriate for the synthesis project, while also outlining which need to be re-assessed. The team should discuss these challenging references and decide if they are appropriate for inclusion or track down any additional/missing information needed to make proper assessment of their inclusion.
Metagear offers a simple abstract screener to quickly sift through the abstracts and titles of multiple references. Here is some script to help initialize the screener GUI in R:
# load package
library(metagear)
# initialize bibliographic data and screening tasks
data(example_references_metagear)
effort_distribute(example_references_metagear, initialize = TRUE, reviewers = "marc", save_split = TRUE)
# initialize screener GUI
abstract_screener("effort_marc.csv", aReviewer = "marc")
The GUI itself will appear as a single window with the first title/abstract listed in the .csv file. If abstracts have already been screened/coded, it will begin at the nearest reference labeled as “not vetted”. The SEARCH WEB button opens the default browser and searches Google with the title of the reference. The YES, MAYBE, NO buttons, which also have shortcuts ALT-Y and ALT-N, are used to code the inclusion/exclusion of the reference. Once clicked/coded the next reference is loaded. The SAVE button is used to save the coding progress of screening tasks. It will save coding progress directly to the loaded .csv file. Closing the GUI and not saving will result in the loss of screening efforts relative to last save.
Here’s what to expect with this GUI (note that depending on the platform running R, the layout of this GUI will differ slightly):