Arshi Arora

Arshi Arora

Research Biostatistician

Memorial Sloan Kettering Cancer Center

About me

I am a statistician with strong programming skills in R, Perl and shell, and a formal training in Computational Biology. I work at Memorial Sloan Kettering Cancer Center as a Biostatistician and dabble in Cancer Genomics. A typical day involves coming up with simple yet compelling data analysis to answer critical biological questions.

The right hemisphere of my brain is into ceramics, painting and DIY crafts. I also co-host a podcast on Computational Biology called Computationally Yours!


  • MS Biostatistics, 2017

    Columbia University

  • MS Computational Biology, 2010

    Carnegie Mellon University

  • B.Tech Biotechnology, 2008

    Amity University





Potter (wizarding and muggle)






Package to wrangle and visualize genomic data in R

iCluster and TCGA

Integrative clustering of TCGA datasets


Visualization tool for clustered groups


An outcome weighted supervised clustering algorithm

Recent & Upcoming Talks

panelmap at WSDS, 2020
BIRSBIO 2020 Hackathon
ISMCO (2019)
survClust poster TCGA Legacy: Multi-Omics studies in cancer (2018)

Recent Posts

A brief primer on scientific and mathematical notations

As I finished writing the final draft of my first first author paper, survClust, there were a lot of other firsts! In my opinion writing the methods and a crisp conclusion and discussion were the difficult parts.

Academic Hugo Theme via Blogdown: Few more details and deployment (part 2)

This is in continuation to a post I wrote - Academic Hugo Theme via Blogdown: Where to start? After setting up a basic website with About, Skills and Experience pages.

Academic Hugo Theme via Blogdown: Where to start?

Setting up a personal website is fun and a great way to reach visibility. Whether its your work, skills, or other hobbies, they all can reach the light of day in one platform!

Journey so far


Principal Investigator


Feb 2022 – Present Wilmington, DE
Responsibilities include: Check back later!

Research Biostatistician II

Memorial Sloan Kettering Cancer Center

Jan 2018 – Present New York
Responsibilities include:

  • Developed survClust, a semi-supervised classification algorithm that maximizes survival differences in molecularly profiled cohorts on various platforms. This is implemented through a weighted clustering model, utilizing multidimensional scaling. Classification groups are deemed robust via cross-validation.
  • survClust was then used in a pancancer cohort of patients treated with immune checkpoint blockade therapies to stratify patients with worst prognisis. Read more here
  • Lead genomics analyst of the International consortium of Melanoma (InterMEL) and working on various predictive models based on multi-omic data and identifying germline calls from tumor-only somatic mutation calling pipeline

Research Biostatistician

Memorial Sloan Kettering Cancer Center

Dec 2014 – Jan 2018 New York
Responsibilities include:

  • Integrated analysis of various cancer types as part of The Cancer Genome Atlas (TCGA) consortium like Liver Hepatocellular Carcinoma (LIHC), Prostate Adenocarcinoma (PRAD) and Skin Cutaneous Melanoma (SKCM) using joint latent variable model implemented in iCluster, to arrive at molecularly distinct subtypes.
  • Providing genomics and analytical support to faculty members of Epidemiology and Biostatistics Department at Memorial Sloan Kettering Cancer Center on a broad range of analysis like copy number and clonal evolution, mutational signature analysis, and building statistical models to identify prognostic molecular features in exome sequencing and mutation panel testing datasets.

Assistant Research Biostatistician

Memorial Sloan Kettering Cancer Center

May 2012 – Dec 2014 New York
Responsibilities include:

  • Developed a somatic mutation caller by applying Random Effects model. Tested on publically available and in-house datasets.
  • Understanding etiological tumor heterogeneity across various molecular assays like gene expression, mutation, copy number, and epigenetic data through known clinical risk factors to characterize distinct risk groups.
  • Developed a validated prognostic gene risk score of colorectal cancer liver metastasis patients.

Graduate Research Assistant

University of Pittsburgh

Oct 2009 – Apr 2012 Pittsburgh

Benos Lab

  • Development of miRNA-seq Illumina pipeline in Perl, involving quality filtering of reads, trimming of adapters, and aligning to the Genome to obtain raw reads for analysis and for the analysis of potential miRNA targetable genes. Understanding miRNA targets and the effect of co-operativity of miRNA via thermodynamics on CLIPSeq data.

Faeder Lab

  • Coded a rule-based programming language translator between two rule based modeling languages BioNetGen (BNG) to Stochastic Simulation Compiler (SSC).
  • Effectively managed code testing, resolving bugs nd releasing distributions of BioNetGen (BNG).
  • Understanding and modeling of Toll Like Receptor 4 (TLR4) pathway via rule-based models to mimic preconditioning by repeated simulations and identify teh origins of memory effect in TLR4 response to bacterial infection.