Tapas Bhadra

College Park, MD· (240) 259-1247 · tapas.bhadra@marylandsmith.umd.edu

Hi there! I am Tapas and I'm currently pursuing my masters in Information Systems at the University of Maryland with a focus in Data Science and Business Intelligence. I am a Data Sceintist with over 3 years of experience and a solid foundation in programming with Python and R and have done various projects in the data science domain.

Recently, I worked as a Data Scientist at Cogno AI aiming to build chatbots and co- browsing solutions for our clients. I worked cross-functionally with the customer success team where we focused on developing machine learning models for our clients. This along with time series data analysis helped increase the usage of our products by 67% within 4 months. At Popshot, as a data Analyst intern, I worked on increasing customer reach and satisfaction by developing a machine learning model to conduct product analysis.

I'm really interested in learning about new technologies and how they might be applied to my work. I have worked on a wide range of projects that have exposed me to a spectrum of diverse machine learning and AI based technologies. My anlytical skills, combined with the quantitative skills I’m developing through my Smith program will help me be successful in a variety of client assignments.

I truly believe that Artificial Intelligence will shape the future of various industries, freeing people from repetitive tasks and, in turn, helping focus on progress through data-driven management and, I aspire to be a part of this change.


Experience

Data Scientist Intern

  • Collaborating with lead data scientists to understand end user sentiment and make recommendations to an e-commerce client using user reviews
  • Performing NLP based tokenization, similarity measures like Levenshtein distance on over 2.5 million reviews, implementing pre-processing on text features like N-gram, frequency, POS tags and similarity & assess performance of the model.
  • Optimizing client’s search engine results by 29% by creating product embeddings and calculating similarity with keywords

September 2022 - Present

Data Science Intern

  • Collaborated with marketing teams to improve business strategy & pricing, analyzed commodity volume and retail price data, using causal inference identified opportunity with projected business revenue of $3.5 million over 1 year.
  • Conducted trend analysis on freight cost, shipping time series data to identify customer price sensitivity for 5 products created data visualizations & dashboards.
  • Designed database with operations team to automate inventory management procedures across the company.

June 2022 - August 2022

Teaching Assistant, Data Mining and Predictive Analytics

  • Worked with Prof. Gah-Yi Ban and helped with the execution of lesson plans, created and graded assignments.
  • Conducted weekly office hours to help students with their doubts in machine learning

July 2022 - August 2022

Research Assistant

  • Evaluated the extent to which websites track users by analyzing third party cookies shared across domains and visualized using Social Network Analysis (SNA) to provide team with valuable insights.
  • Assessed similarity between 10 privacy bills using TF-IDF, cosine similarity to understand trends across US states.

June 2022 - August 2022

Data Scientist

  • Led the restructuring of 10+ client-based Salesforce CRM tool with engineering team, conducted requirements specification to incorporate an in-house co-browsing solution to enhance efficiency of Service Reps.
  • Worked cross-functionally with customer success team to increase efficiency of 35+ products by analyzing the generated data and customizing the product basis client infrastructure and usage.
  • Set and managed timelines, holding internal and external stakeholders accountable while understanding the flexible nature of client deadlines while boosting efficiency in the deployment process by 40%.
  • Increased product efficiency by analyzing the product generated data by 70%

June 2020 - June 2021

Data Science Intern

  • Analyzed and extracted features from a database of images containing handwritten texts.
  • Used clustering algorithms for classification and delivered a comprehensive comparative study of the accuracy of the algorithms.
  • Developed a model using python (NumPy, pandas, TensorFlow), MATLAB to implement K-nearest neighbors, TSNE, and ther clustering algorithms on handwritten digits data and achieved accuracy of 97%

July 2019 - September 2019

Data Analyst

  • Created an information system to include reviews and feedback from various users of the application.
  • To improve user experience, developed a machine learning application using Random Forest Classifier and carried product analysis that improved customer retention by 15%

October 2019 - December 2019

Education

University of Maryland | College Park, MD

Masters of Sceince
Information Systems - Data Science

Relevant coursework: Data Models and Decisions, Data Processing and Analysis, Database Management Systems, Managing Digital Business Markets, Digital transformation in Business

GPA: 4.0

August 2021 - December 2022

University of Mumbai

Bachelors of Technology
Computer Science

Relevant coursework: Data Structures, Object-Oriented Programming, Analysis of Algorithms, Database Management Systems, Advanced Databases, Operating Systems, Data Mining, Artificial Intelligence, Neural Network & Fuzzy Logic.

GPA: 3.70

July 2016 - October 2020

Key Skills

  • Programming Languages: Python, R, Java, C, CPP, HTML5, CSS3, Javascript
  • Modeling: Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, K-means, KNN, Sentiment Analysis, Lasso Regression, Naive Bayes, Deep Neural Networks, Convolutional Neural Networks. ARIMA, SARIMA, Boosting
  • Databases: MySQL, MongoDB, SSMS, SSIS, PostgreSQL
  • Visualization Tools: Tableau, PowerBI
  • Libraries: Numpy, Pandas, Matplotlib, Seaborn, Scikit-learn, Tenserflow, Pytorch, Statsmodel, NLTK, Ggplot, Keras
  • Operating Systems: Windows, Linux, Ubuntu
  • Tools: AWS SageMaker, AWS Lambda, AWS EC2, Git, MS Excel

Projects

Content-based Song Recommender System using Spotify API

MLE
May 2022 - June 2022

Technologies used: Pandas, NLTK, TF-IDF, Cosine Similarity, Spotipy, Python

  • Scraped, extracted song data containing 25 features using spotify API, created a dataset containing 2000 tracks from 10 genres
  • Created TF-IDF vectors of the song genres and artist information
  • Performed feature engineering and normalization
  • Calculated Cosine Similarity of the training data and user playlist and recommended 10 songs based on user features

Here is the Github Repo

Kickstarter Data Analysis to predict Successful Projects for Funding

Data Scientist
March 2022 - May 2022

Technologies used: R, Tidyverse, GGplot2, TF-IDF, XGBoost, Rstudio

  • Performed exploratory data analysis, normalization, and correlation analysis using R to select relevant features from a set of 60 features
  • Created fitting curves, calculated AUC for XGBoost, random forest and GBM and selected the best model.

Here is the Github Repo

MovieBusters: Film Data Analysis

Data Analyst
October 2021 - December 2021

Technologies used: Pandas, NLTK, Logistic Regression, BeautifulSoup, Seaborn

  • Scrapped movie data from IMDB and The Numbers websites.
  • Developed a machine learning model to identify and rate actors based on the genres to maximize revenue
  • Performed natural language processing (NLP) on film descriptions to filter important screenplay keywords of successful films and performed data visualization using word cloud.

Here is the Github Repo

Song Genre Detection System

Data Scientist
September 2020 - October 2020

Technologies used: Python, RCNN, DJango, AWS

  • Built a deep learning model using CNN to identify song genre.
  • Created a model to convert the song to 10 different genres
  • Designed and configured the backend application using Django and hosted on AWS (Amazon Web Services) cloud.

Prediction of Mortality Rate Post Thoracic Surgery

Team Lead
June 2019 - March 2020

Technologies used: Python, Logistic Regression, Random Forest, Flask, Pandas, Numpy, Seaborn, Matplotlib, SQL, Flask

  • Built a web application using Flask to provide statistical backing for doctors to enhance accurate thoracic surgery predictions
  • Analyzed patient history and created a Random Forest Classifier to predict life expectancy with a 98% accuracy

Here is the Github Repo for the web app
Here is the Github Repo for the pre-processing

Customer Behavior Analysis in Apps to Drive Subscriptions

Data Analyst
April 2020 - June 2020

Technologies used: Python, Logistic Regression, Pandas, Numpy, Seaborn, XML, MySQL

  • Classified customers into likely and unlikely based on their app usages for a paid product service
  • Created a model by extracting relevant features using correlation analysis and hypothesis testing to increase the customer retention rate

Here is the Github Repo

Predicting the Variety of Wine

Data Analyst
June 2020 - July 2020

Technologies used: Python, Seaborn, NLTK, Natural Language Processing, Tensorflow, Sequantial Model, Neural Networks

  • Preprocessed user reviews using tokenization, lemmatization, stemming and vectorization.
  • Implemented Latent Dirichlet Allocation (Natural Language Processing) to extract the main topics form the document.
  • Implemented Sequential model to predict the variety of wine with 97% accuracy

Here is the Github Repo

QNA (Question-Answer WebApp)

Software Developer
September 2018 - November 2018

Technologies used: PHP, HTML, Javascript, CSS, MySQL, AJAX, MySql, AWS

  • Built a peer to peer doubt solving platform to help students resolve their queries anonymously
  • Added an upvote-downvote feature to filter inappropriate posts that helped improve quality of content
  • Successfully hosted the application on Amazon Web Services cloud which was used by 10000 students.

Here is the Github Repo