Workshop 1: Building open data science skills in paleobiology and ecology

IMPORTANT: This workshop overlaps with Workshop 2: Decoding the past: Deep learning for macroevolutionary analyses. Therefore when registering please only select one of these workshops, as you won't be able to participate in both!

Date: July 27, 2025

Location: University of Zurich, Central campus, room KO2-F-152

Mode: This workshop will be held in person.

Length: Full day, coffee breaks and lunch will be provided

Participant cap: 50, aimed at students and early career researchers

Registration deadline: June 22, 2025

Workshop leaders:

Dr. Erin Dillon, Smithsonian Tropical Research Institute (dillone@si.edu)
Dr. Lewis Jones, Univeristy College London (lewisa.jones@outlook.com)
Dr. Bethany Allen, ETH Zürich (bethany.allen@bsse.ethz.ch)
Dr. William Gearty, Syracuse University (willgearty@gmail.com)

Description:

R is one of the most popular languages in the world of data science and has been widely adopted by the paleobiological and ecological communities for data analysis. General familiarity with R allows users to automate routine tasks, create reproducible analytical workflows, and expand the potential of their research. This workshop aims to introduce participants to the versatility of R for cleaning, analyzing, and visualizing paleobiological and ecological data. It will cover topics including: data acquisition from biodiversity databases such the Paleobiology Database, Neotoma Paleoecology Database, and Global Biodiversity Information Facility; building workflows in R to clean and analyze data; data visualization and synthesis; and guidelines (e.g., FAIR and CARE principles) and tools (e.g., GitHub) for creating more reproducible code and accessible documentation. In doing so, this workshop will introduce attendees to palaeoverse, an R package that supports data preparation and exploration for paleobiological analysis. Participants will also become familiar with additional packages developed by the Palaeoverse Community www.palaeoverse.org), such as rphylopic. More broadly, this event aims to connect participants working in different fields who share a common interest in data science and provide a platform for participants to gain experience working collaboratively in R to generate reproducible interdisciplinary research. Participation will be capped at 30 individuals.

Scope of topics:

Acquisition of modern and fossil biological datasets from biodiversity databases, including a discussion of data harmonization and synthesis;
Data preparation and exploration using the palaeoverse R package, focusing on a worked example that integrates modern and fossil biodiversity data (e.g., using BioDeepTime, GBIF, or iDigBio) in an analysis to cross the gap;
Data visualization Open and collaborative data science practices, including the FAIR and CARE principles as well as data science tools like GitHub, Quarto, Zenodo, etc.
Computational paleobiology and programming resources
Hands-on coding practicum

As we cover each of these topics, we will discuss the integration and visualization of different paleobiological and ecological data types to promote CPEG’s interdisciplinary goals.

Quicklinks

Main navigation

Workshop 1: Building open data science skills in paleobiology and ecology