Senior Data Engineer
FosterThomas, a Mid-Atlantic Staffing and Recruiting Firm, is leading the search for a Senior Data Engineer for our Client. This position is available remotely.
is seeking a Senior Data Engineer (PySpark, Cloud) to lead the development of a greenfield cloud data application. The application is a cloud data pipeline that processes Protected Healthcare Information, with dashboards. The application is being developed first on GCP and we use a number of cloud-native services, including Dataproc as a data science platform.
You will lead data infrastructure engineering in support of a data science team who will develop hundreds of transformation rules using PySpark in GCP DataProc notebooks. You will develop and implement functional requirements around reliability, robustness, performance, optimizations, parallelism for multiple tenants, scalability for millions of records. You will also develop Python abstractions to help the data scientists write production quality code, or otherwise develop the data science team’s notebooks into a production grade automated pipeline that is performant, robust and fault-tolerant.
- Cloud data engineering experience
- Strong Python coding skills and deep software engineering experience
- Advanced working knowledge of databases, data pipelines, stream processing
Some technologies we are interested in:
- GCP, Delta Lake, Spark, PySpark, Databricks, Dataproc, Airflow, Snowflake, Anthos