Saonli

Saonli Basu

1. What are your current research interests?


I have a broad background and expertise in statistical genomics. I develop analytic and computational tools to integrate genomic research into precision public health.  My current research focuses on dimension reduction approaches for high dimensional genomic data analysis, heritability estimation and genomic risk prediction in multi- ethnic datasets, detection of gene-environment, and gene-gene interaction in longitudinal and family-based association studies.
 
2. How do you define Data Science?


Data Science is a deeply interdisciplinary field encompassing principles and tools of many disciplines to gain insights and extract knowledge from vast datasets. It involves developing and implementing statistically sound and computationally efficient techniques to parse and process the datasets and interpret the findings for data-driven decisions and predictions.
 
3. Can you share an interesting or surprising result you’ve found in your data?


I’m always amazed by how the diversity of genetic data, stemming from variations in genotyping or sequencing technologies, and variances in data cleaning or preprocessing methods can have a significant impact on the analysis findings. Recently, we participated in a data challenge competition (https://www.cmi-pb.org/blog/prediction-challenge-overview/) to develop computational models to perform prediction with multi-omics data. When we compared notes, many of us used similar computational models for prediction. Our meticulous data quality control steps to harmonize and standardize different datasets likely contributed to our winning first place in the competition.  


4. Are there any interesting new tools or libraries you or your students have been using?


My new initiative Genomic Data Commons (https://www.sph.umn.edu/research/centers/genomic-data-commons/) focuses on standardizing multiple sources of genomic data and developing tools and algorithms to facilitate reproducible genomic research. We are developing several tools primarily in R and Python using Visual Studio Code (VSCode). Developing many of these workflows in VSCode has been a great experience in terms of its flexibility to accommodate different programming languages, its ability to integrate with Github copilot, and its debugging and troubleshooting capability on remote MSI servers. 


5. What are you most excited about in the field of data science in the next 5 years?


My field of genomics is broadening its horizons. Increasingly, we're seeing a wealth of datasets encompassing diverse ethnicities and various types of omics data. Harnessing the potential of integrating these multi-omic datasets holds promise for advancing precision health. Moreover, the genetic diversity across different ethnic populations will enhance personalized medicine efforts. Navigating through these diverse, heterogeneous, and enormously large datasets to extract meaningful insights is extremely challenging, but this presents an exciting opportunity for the continued advancement of data science techniques and developing tools for responsible data science practices.

Catchup on the Latest News at DSI

WiADS 2024 Conference: A Day of Inspiration and Connection for Women in Data Science

On November 4th, 2024, the University of Minnesota hosted the highly anticipated Women in AI and Data Science (WiADS) Conference, organized in partnership with MinneAnalytics at the McNamara Alumni Center. This sold-out event brought together over 1,000 registrants, including over 550 in-person attendees, and showcased the work of women, non-binary, and gender-diverse voices in data science.