Research Platforms

Work in Progress Updated Jan 31, 2026

terra

uses the Google Cloud Platform

No markup by Terra

Fees related to: -storage of data - moving data

Analysis of data - bulk workflow - interactive analysis (RStudio, Jupyter, Galaxy)

You can check expenses on Google Cloud Billing console

Storage Options - Workspace (Terra managed Google Bucket) - External bucket – owner pays storage – user pays download cost

Each workspace has two dedicated cloud storage resources - workspace storage= google bucket - persistent disk = labtop in the cloud; one per user per workspace – default size is 50 GB – cost is $2.00/month (or $0.04 GB/month) – like a USB drive, the PD can be detached from the VM before deleting or recreating the clud environment – the PD lets you keep your notebooks code, package, input tables, outputs without have to move anythng to workspace storage (google bucket) – collaborators cannot access your PD – Types of PD — Standard: standard hard disk drives; least expensive option — solid state: solid state drives; more expensive, faster and more power efficient — balanced: combo/balance of performance and cost – PD file directory — RStudio = /home/rstudio — Jupyter = /home/jupyter – the PD automatically synces analysis files themselves with the workspace bucket; user managed pakcages, input files, and generated outputs are not automatically synced with the workspace bucket

Anything saved outside this directory is not saved on the PD and will be lost when the cloud environment is deleted

PD save time and reduce errors - save time because you don’t have to reinitialize cloud environment each time you use an app that requires a long time to install packages or load file input

Interactive analysis costs - 3 separate billable components - CPU = most expensive resource - Memory = add monitoring to workflow to know how much to use - Disk size = add monitoring to workflows

Pay cost per hour

Built in auto-puase

Cause pause for nominal fee < $0.01/hr

Data stored in a table in Terra - Primary data in tabular format - importa data file locations - input data file metadata

data table are hosted in a relational database owned/managed by Terra

Reasons to move data from PD to workspace google Bucket - Share data with a colleague - Use generated data as workflow input - Archive data generated in a cloud environment

To move larger # of files or data size, can use gsutil in terminal

Google Cloud Storage - object store - objects stored in buckets (which are like folders or directories) - storage has a cost - moving data out has a cost (egress) - primary drivers to cost – how much data stored – where is data stored — different storage classes —- nearline $0.010 GB/month storage; $0.01/GB access; coldline storage $0.004 GB/month; $0.05/GB access

— multi-regional vs regional storage (do not need multi-regional unless website/gaming efficiency for example) —- regional = $0.020 - $0.023/GB per month – how frequently access data? — data less frequent accessed = “cold storage” (cheaper)

Data transfer example - copying data stored in 1 region to a compute VM or cloud environment PD in anothe region ~$0.01/GB

How to control storage costs - compress files (e.g. bgzip) - use regional storage - move to cold storage - clean up intermediate files promptly

Cloud environment region default us-central1

Best practice is to have workspace bucket and cloud environment bucket in same location to minimize egress costs

URI = Uniform resouce identifier - a close cousin of the URL - uniform resource locator

Use command line gsutil to move data to worksapce storage from PD