IT Lead Data Engineer - Remote

Mayo Clinic
Apply Now

Job Description

Responsibilities

As a member of the Data and Analytics organization, you will be responsible for building and delivering best-in-class clinical data initiatives aimed at driving best-in-class solutions. You will collaborate with analytic partners and business partners from product strategy, program management, IT, data strategy, and predictive analytics teams to develop effective solutions for our partners.

Lead data design, prototype, and development of data pipeline architecture pipelines. Lead implementation of internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability. Lead cause analysis on external and internal processes and data to identify opportunities for improvement and answer questions. Excellent analytic skills associated with working on unstructured datasets. Understand the architecture, be a team player, lead technical discussions and communicate the technical discussion. Be a senior Individual contributor of the Data or Software Engineering teams. Be part of Technical Review Board along with Manager and Principal Engineer. Be a technical liaison between Manager, Software Engineers and Principal Engineers. Collaborate with software engineers to analyze, develop and test functional requirements. Write clean, maintainable code 30% of the time and performing peer code-reviews. Mentor and Coach Engineers. Work with team members to investigate design approaches, prototype new technology and evaluate technical feasibility. Work in an Agile/Safe/Scrum environment to deliver high quality software. Establish architectural principles, select design patterns, and then mentor team members on their appropriate application. Facilitate and drive communication between front-end, back-end, data and platform engineers. Play a formal Engineering lead role in the area of expertise. Keep up-to-date with industry trends and developments.

Job Responsibilities:

  • Act as Product Owner for Data platform’s and Lead the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Evaluate the full technology stack of services required including PaaS, IaaS, SaaS, DataOps, operations, availability, and automation.
  • Research, design, and develop Public & Private Data Solutions, including impacts to enterprise architecture
  • Build high-performing clinical data processing frameworks leveraging Google Cloud Platform, GCP Shared Services like Google Healthcare API, Big Query, and HL7 FHIR store.
  • Participate in evaluation of supporting technologies and industry best practices with our cloud partners and peer teams.
  • Lead Modern Data Warehouse Solutions and Sizing efforts to create defined plans and work estimates for customer proposals and Statements of work.
  • Conduct full technical discovery, identifying pain points, business, and technical requirements, “as is” and “to be” scenarios!
  • Design and Develop clinical data pipelines integrating ingestion, harmonization, and consumption frameworks for onboarding clinical data from various data sources formatted in various industry standards (FHIR, C-CDA, HL7 V2, JSON, XML, etc.).
  • Build state-of-the-art data pipelines supporting both batch and real-time streams to enable Clinical data collection, storage, processing, transformation, aggregation, and dissemination through heterogeneous channels.
  • Build design specifications for health care data objects and surrounding data processing logic.
  • Lead innovation and research building proof of concepts for complex transformations, notification engines, analytical engines, and self-service analytics
  • Bring a DevOps mindset to enable big data and batch/real-time analytical solutions that leverage emerging technologies.

Qualifications

Bachelor’s Degree in Computer Science/Engineering or related field with 6 years of experience OR an Associate’s degree in Computer Science/Engineering or related field with 8 years of experience.

Knowledge of professional software engineering practices and best practices for the full software development life cycle (SDLC), including coding standards, code reviews, source control management, build processes, testing, and operations. Have in-depth knowledge of data engineering and building data pipelines with a minimum of 5 years of experience in data engineering, data science or analytical modeling and basic knowledge of related disciplines. Worked and lead Data Engineering teams in Continuous Integration / Continuous Delivery model. Build/Lead Data products highly resilient in nature. Build/Lead Test Automation suites, Unit Testing coverage, Data Quality, Monitoring & Observability. A minimum experience of 5 years using relational databases and NoSQL Databases. Experience with cloud platforms such as GCP, Azure, AWS.

Continuous Integration using Jenkins, Git Hub Actions or Azure Pipelines. Experience with cloud technologies, development and deployment. Experience with tools like Jira, GitHub, SharePoint, Azure Boards. Experience using advanced data processing solutions/capabilities such as Apache Spark, Hive, Airflow and Kafka, GCP Dataflow. Experience using big data, statistics and knowledge of data related aspects of machine learning. Experience with Google BigQuery, FHIR APIs, and Vertex AI. Knowledge of how workflow scheduling solutions such as Apache Airflow and Google Composer related to data systems. Knowledge of using Infrastructure as code (Kubernetes, Docker) in a cloud environment.

  • Hands-on experience in architecture, design, and development of enterprise data applications and analytics solutions within the health care domain
  • Experience in Google Cloud Platform/Shared Services such as Cloud Dataflow, Cloud Storage, Pub/sub, Cloud Composer, Big Query, and Health care API (FHIR store)
  • They should be able to deliver an ingestion framework for relational data sources, understand layers and rules of a data lake and carry out all the tasks to operationalize data pipelines.
  • Experience in Python, Java, Spark, Airflow, and Kafka development
  • Hands-on experience working with “Big Data” technologies and experience with traditional RDBMS, Python, Unix Shell scripting, JSON, and XML
  • Experience working with tools to automate CI/CD pipelines (e.g., Jenkins, GIT)
  • Must have great articulation and communication skills.
  • Working in a fluid environment, defining, and owning priorities that adapt to our larger goals. You can bring clarity to ambiguity while remaining open-minded to new information that might change your mind.
  • Should have a strong understanding of healthcare data, including clinical data in proprietary and industry-standard formats.
  • Participate in architectural discussions, perform system analysis which involves a review of the existing systems and operating methodologies. Participate in the analysis of newest technologies and suggest the optimal solutions which will be best suited for satisfying the current requirements and will simplify the future modifications
  • Design appropriate data models for the use in transactional and big data environments as an input into Machine Learning processing.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability
  • Design and Build the necessary infrastructure for optimal ETL from a variety of data sources to be used on GCP services
  • Collaborate with multiple stakeholders including Product Teams, Data Domain Owners, Infrastructure, Security and Global IT
  • Identify, Implement, and continuously enhance the data automation process
  • Develop proper Data Governance and Data Security
  • Demonstrate strategic thinking and strong planning skills to establish long term roadmap and business plan
  • Work with stakeholders to establish and meet data quality requirements, SLAs and SLOs for data ingestion
  • Experience in Self-service Analytics/Visualization tools like PowerBI, Looker, Tableau
  • Proven knowledge in implementing security & IAM requirements
  • Experience building and maintaining a Data-Lake with DeltaLake
  • Experience with ETL/ELT/DataMesh frameworks
  • Experience with GCP Dataplex (Data Catalog, Clean Rooms)

Compensation Detail

$138,236.80 - $200,408.00 / year

Company Info.

Mayo Clinic

The Mayo Clinic is a nonprofit American academic medical center focused on integrated health care, education, and research.

Get Similar Jobs In Your Inbox

Mayo Clinic is currently hiring Lead Data Engineer Jobs in Rochester, MN, USA with average base salary of $138,237 - $200,408 / Year.

Similar Jobs View More