Data Engineer - GCP

Liminal

Data: há 1 semana

Cidade: Porto, Porto

Tipo de contrato: Tempo total

Remoto

Liminal is a global market intelligence and strategic advisory firm specializing in digital identity, financial crime and compliance, and IT security technology solutions across industries while also catering to the private equity and venture capital community. Founded in 2016, Liminal offers strategic and analytical services supporting executive decision-making at all product and business lifecycle stages. We advise some of the world’s most prominent business leaders, investors, and policymakers on building, acquiring, and investing in the next generation of solutions and technologies. We provide access to proprietary data and analysis, strategic frameworks, and integrated insights on the industry’s only market intelligence platform.

Every major company in the world has started focusing on the next generation of digital identity technologies as a necessity for continued growth and security. Our team works with a myriad of organizations, from Fortune 100s to startups, across industries including financial services, technology, telecommunications, and the P2P economy. At Liminal, we help businesses build solutions, execute strategies, invest intelligently, and connect with key decision-makers. We know that it’s in the sharing of discovery and insights that groundwork is laid, problems are solved, and entire sectors advance at the speed of light. Keeping information to ourselves delays progress for all. At Liminal, we don't just respond to the market; we define it.

About The Role

This role focuses on building and maintaining strong data architectures, pipelines, and systems that support the effective collection, storage, and processing of data across multiple departments. Our Data Engineer will play a pivotal role in ensuring the scalability, reliability, and performance of our data systems. With a strong background in data engineering, cloud infrastructure, and data pipeline automation, the Data Engineer will work on projects from initial design to deployment and will support the smooth integration of data workflows into product and operational teams.

What You'll Do

Cross-Department Data Solutions:

Collaborate with various departments to understand data needs, assess technical feasibility, and design efficient data engineering solutions to support organizational initiatives.
Implement scalable data workflows that optimize data availability, quality, and accessibility for AI, business analytics, and other internal teams.
Support product teams in transitioning mature data pipelines and systems to ensure alignment with product goals and technical requirements.

Data Pipeline Development & Optimization

Design, implement, and maintain data pipelines that ingest, process, and transform large-scale datasets for internal applications, including AI and machine learning models.
Build efficient ETL (Extract, Transform, Load) processes that streamline the movement of data between systems, databases, and analytics platforms.
Optimize data flows to ensure high performance, low latency, and scalability, adapting pipelines to handle both batch and real-time processing.

Cloud Infrastructure & System Integration

Develop and maintain cloud-based data infrastructure on platforms such as Google Cloud Platform (GCP), ensuring data systems are robust, cost-effective, and performant.
Implement data storage solutions, including BigQuery, Cloud Storage, and distributed databases, ensuring seamless integration with other internal systems.
Leverage cloud services for scalable data processing and storage, ensuring that infrastructure can support growing datasets and organizational demands.

Data Quality & Governance

Establish data validation processes to ensure data quality, consistency, and integrity across all pipelines and systems.
Collaborate with data scientists and analysts to ensure data is structured and formatted for optimal use in analytics and AI applications.
Ensure compliance with data governance policies and best practices for data privacy, security, and auditability.

Automation & Monitoring

Implement automation for data processing workflows, reducing manual intervention and ensuring consistent delivery of high-quality data.
Set up monitoring and alerting systems for pipeline health, performance metrics, and data anomalies to proactively address any issues.
Continuously optimize existing data systems and pipelines to improve performance, reduce errors, and enhance reliability.

Documentation & Collaboration

Maintain comprehensive documentation of data architectures, data pipeline designs, and system integrations to facilitate clear communication and collaboration.
Document technical workflows, processes, and system configurations to ensure smooth handoffs and enable other teams to leverage data assets effectively.
Collaborate with cross-functional teams, including data scientists, product developers, and business stakeholders, to ensure data solutions align with organizational goals.

Qualifications

Strong background in data engineering, data architecture, and system design, with extensive experience building and optimizing large-scale data systems.
Proficiency in Google Cloud Platform (GCP), including key tools such as:

BigQuery for data warehousing and analytics.
Cloud Dataflow for stream and batch data processing.
Cloud Storage (GCS) for scalable object storage solutions.
Pub/Sub for real-time messaging and event-driven architecture.
Cloud Composer (based on Apache Airflow) for orchestrating and automating workflows.
Dataproc for big data processing with Apache Hadoop and Spark.
Data Fusion for data integration and ETL pipeline management.

Proficiency in Python for scripting and automation of data processing tasks.
Solid understanding of SQL and experience with database management systems (e.g., PostgreSQL, MySQL, or NoSQL solutions), specifically within GCP's ecosystem.
Experience with GCP Identity and Access Management (IAM) for securing data access and managing roles and permissions across cloud services.
Experience with data lakes and warehouses on GCP, particularly BigQuery and Cloud Storage, and familiarity with data modeling and optimization techniques within these tools.
Familiarity with containerization technologies, including Docker, and container orchestration with Google Kubernetes Engine (GKE).
Experience in data pipeline development and optimization, ensuring pipelines are scalable, high-performing, and fault-tolerant.
Knowledge of CI/CD pipelines and version control (e.g., Git), and how to integrate GCP services into these workflows for automated deployments.
Experience implementing data security and privacy best practices within GCP, such as encryption, access controls, and data governance.
Strong problem-solving skills, with a demonstrated ability to debug and optimize data pipelines and cloud-based architectures in production environments.
Excellent communication and collaboration skills, able to work across cross-functional teams and with both technical and non-technical stakeholders.
Familiarity with monitoring tools such as Google Stackdriver for tracking pipeline health, performance, and error reporting in GCP.

Postar um currículo

Veja mais empregos em Porto, Porto