Arshi Arora

Arshi Arora

Research Biostatistician

Memorial Sloan Kettering Cancer Center

About me

I am a statistician with strong programming skills in R, Perl and shell, and a formal training in Computational Biology. I work at Memorial Sloan Kettering Cancer Center as a Biostatistician and dabble in Cancer Genomics. A typical day involves coming up with simple yet compelling data analysis to answer critical biological questions.

The right hemisphere of my brain is into ceramics, painting and DIY crafts. I also co-host a podcast on Computational Biology called Computationally Yours!


  • MS Biostatistics, 2017

    Columbia University

  • MS Computational Biology, 2010

    Carnegie Mellon University

  • B.Tech Biotechnology, 2008

    Amity University





Potter (wizarding and muggle)





iCluster and TCGA

Integrative clustering of TCGA datasets


Visualization tool for clustered groups


An outcome weighted supervised clustering algorithm

Recent Posts

A brief primer on scientific and mathematical notations

As I finished writing the final draft of my first first author paper, survClust, there were a lot of other firsts! In my opinion writing the methods and a crisp conclusion and discussion were the difficult parts.

Academic Hugo Theme via Blogdown: Few more details and deployment (part 2)

This is in continuation to a post I wrote - Academic Hugo Theme via Blogdown: Where to start? After setting up a basic website with About, Skills and Experience pages.

Academic Hugo Theme via Blogdown: Where to start?

Setting up a personal website is fun and a great way to reach visibility. Whether its your work, skills, or other hobbies, they all can reach the light of day in one platform!

Journey so far


Research Biostatistician II

Memorial Sloan Kettering Cancer Center

Jan 2018 – Present New York
Responsibilities include:

  • Developing a semi-supervised classification algorithm that maximizes survival differences in molecularly profiled cohorts on various platforms. This is implemented through a weighted clustering model, utilizing multidimensional scaling. Classification groups are deemed robust via cross-validation.
  • Providing genomics and analytical support to faculty members of Epidemiology and Biostatistics Department at Memorial Sloan Kettering Cancer Center on a broad range of analysis like copy number and clonal evolution, mutational signature analysis, and building statistical models to identify prognostic molecular features in exome sequencing and mutation panel testing datasets.

Research Biostatistician

Memorial Sloan Kettering Cancer Center

Dec 2014 – Jan 2018 New York
Responsibilities include:

  • Analyzing and summarizing genomic data with statistical models and survival analysis followed by biological pathway analysis to arrive at a well-rounded conclusion to biological questions.
  • Integrated analysis of various cancer types as part of The Cancer Genome Atlas (TCGA) consortium like Liver Hepatocellular Carcinoma (LIHC), Prostate Adenocarcinoma (PRAD) and Skin Cutaneous Melanoma (SKCM) using joint latent variable model implemented in iCluster, to arrive at molecularly distinct subtypes.

Assistant Research Biostatistician

Memorial Sloan Kettering Cancer Center

May 2012 – Dec 2014 New York
Responsibilities include:

  • Developed a somatic mutation caller by applying Random Effects model. Tested on publically available and in-house datasets.
  • Understanding etiological tumor heterogeneity across various molecular assays like gene expression, mutation, copy number, and epigenetic data through known clinical risk factors to characterize distinct risk groups.
  • Developed a validated prognostic gene risk score of colorectal cancer liver metastasis patients.

Medical Scientist

University of Pittsburgh

Nov 2010 – Apr 2012 Pittsburgh
Development of miRNA-seq Illumina pipeline in Perl, involving quality filtering of reads, trimming of adapters, and aligning to the Genome to obtain raw reads for analysis and for the analysis of potential miRNA targetable genes. Understanding miRNA targets and the effect of co-operativity of miRNA via thermodynamics on CLIPSeq data.

Health Sciences Fellow

University of Pittsburgh

Oct 2009 – Oct 2010 Pittsburgh
Responsibilities include:

  • BioNetGen (BNG) to Stochastic Simulation Compiler (SSC) translator written in Perl.
  • Understanding and modeling of Toll Like Receptor 4 (TLR4) pathway.
  • Bug resolving, updating the BioNetGen wiki.