POSITION SUMMARY for the Associate Data Scientist (Raleigh, NC, USA): The data scientist trains, validates, and manages machine learning solutions to advance Carpenter Technology’s digital transformation. Mines and analyzes complex and unstructured data sets using advanced statistical methods for use in data driven decision making. Performs research, analysis, and modeling on organizational data. Develops and applies algorithms or models to key business metrics with the goal of improving operations or answering business questions. Position is responsible for the entirety of ML model suite that address production quality issues for all the business lines. Builds ML simulation models to support R&D with product development.

  • Apply data science techniques to massive structured / unstructured data sets across multiple environments in order to discover patterns and solve strategic / tactical business problems – process improvement, yield improvement, and product development.
  • Build statistical and machine learning models for detecting root causes in process and yield variability. Machine learning algorithms will be exercised are – Logit, probit, complementary log-log regression, Random Forest, GBMs such as XGBoost, AdaBoost, CatBoost, LightGBM, RusBoost, AveBoost, ORBoost, SMOTEBoost, etc., Support Vector Machines, KNNs, MLP Neural Net, Convolutional Neural Net. Statistical models will be exercised are – General Linear Model, Generalized Linear Model, Multivariate Regression, Survival Models, Stepwise Logistic Regression, and Non-Parametric Models.
  • Develop prescriptions with actionable and controllable recipes for critical process variables from model parameters with baseline performance and estimated performance upon implementation of model prescriptions.
  • Design and conduct experiment for observational data to identify the factors associated with cost of poor quality and process variability – Randomized, Randomized Block, Latin Square, and Full factorial and apply appropriate general linear models such as Fixed effect, Random Effect, Mixed Effect Models to derive ANOVA, ANCOVA, MANOVA, and MANCOVA.
  • Build process simulation model to identify optimal critical process path using both chaotic dynamic and stochastic process simulation such as Hidden Gauss-Markov and Monte-Carlo Simulation.
  • Develop anomaly detection models such as iForest, Local Outlier Factor, GMM, one class SVM, etc. to identify anomalous behavior in critical process inputs for both batch and stream processing.
  • Manage machine learning model life cycle through documentation, version control, model presentation, model audit, back testing, forward testing, benchmarking with the help of performance metrics.
  • Communicate and democratize model findings very clearly and precisely with stakeholders such as market leads, metallurgists, R&D and senior business leaders.
  • Drive the collection, cleansing, processing, and analysis of new and existing data sources
  • Report findings through appropriate outputs and visualizations tailored for the intended audiences
  • Learn and stay current on analytics developments in one or more business domains: Internet of Things, Manufacturing, Supply Chain, Forecasting, Marketing and Sales, Pricing, etc.
  • Learn and stay current on developments in one or more analytics domains: Operations Research, Machine Learning, Deep Learning / AI, Simulation, etc.
  • Generate innovative ideas, establish new research directions, and shape the information strategy in support of technical projects and new product developments
  • Collaborate with new, cross-functional teams on accelerated projects to scale data architecture, build digital products, and execute data science solutions
  • Perform all other duties and special projects as assigned.

  • BA in computer science, mathematics, statistics, operations research or related field required. MS/PhD preferred.
  • 0-3 years of experience in data science, analytics, and model building roles.
  • Experienced in programming in Python, R, Julia, MATLAB and SAS.
  • Knowledge of other programming languages and analysis tools (e.g. Scala, Java, Ruby, JavaScript, shell scripting).
  • Experience using big data frameworks and tools (e.g. Hadoop, Spark, MapReduce, Hive, Pig, Luigi/Airflow, Kafka, Data streaming, NoSQL, SQL).
  • Familiarity with cloud-based solutions (e.g. Azure, AWS EMR).
  • Knowledge of analytical techniques and methodologies (e.g. machine learning, segmentation, mix and time series modeling, response modeling, lift modeling, experimental design, neural networks, data mining, optimization techniques).
  • Experience with data profiling and data cleansing techniques.

  • Deep working knowledge and experience in probability theory, linear programing, and optimization
  • 1-3 years of experience on statistical inference, predictive modeling, and machine learning
  • Strong programing skills in Python, SQL, SAS, and R
  • Working experience on data visualization tools such as PowerBI and Tableau
  • Excellent communication skills while communicating with data to business leaders
  • Strong written and verbal communication skills, including all levels within the organization.
  • Strong collaboration skills and comfortable working in a team environment.
  • Self-motivated with the ability to prioritize, meet deadlines and manage changing priorities.
  • Proven ability to be flexible and work hard, both independently, in a team environment and under direct supervision. 
  • Possess beginning / working knowledge of subject matter.
  • Natural curiosity and passion for empirical research and problem solving.

Carpenter Technology Company offers a competitive salary and a comprehensive benefits package including life, medical, dental, vision, flexible spending accounts, disability coverage, 401k with company contributions as well as many other options to employees.

