Head of Infrastructure and Reliability

Block Labs

Data: há 14 horas

Cidade: Lisboa, Lisboa

Tipo de contrato: Tempo total

Remoto

Title: Head of Infrastructure & Reliability

Department: Engineering

Location: Remote within EU

About Block Labs

Block Labs is a leading force in the Web3 space, incubating, investing in, and accelerating top-tier fintech, crypto and iGaming projects. With a mission to shape the future of decentralized technology, we partner with visionary startups to raise funding, refine product-market fit, and grow their audiences. Our diverse team drives innovation, using deep industry expertise and an extensive network to empower the next wave of blockchain-driven companies. At Block Labs, we’re passionate about turning bold ideas into breakthrough success.

About The Role

As Head of Infrastructure & Reliability, you will lead a cross functional area responsible for ensuring the stability, scalability, and resilience of our cloud platform and services. This includes oversight of technical leadership across both infrastructure engineering and operational reliability.

You will manage both strategic initiatives and day to day execution, working closely with engineering, product, and executive leadership to ensure that systems are robust, incidents are handled effectively, and the platform evolves to meet growing demands. Your mission is to uphold operational excellence, champion service reliability, and drive continuous improvement across all infrastructure and support systems.

You will also be responsible for owning infrastructure capacity planning, monitoring and optimizing cloud spend, and maintaining alignment with organizational budget goals. In collaboration with security stakeholders, you will lead efforts to define and enforce information security practices and policies across the organization.

Key Responsibilities

Own and evolve the strategy, execution, and performance of the Infrastructure & Reliability area, including both platform operations and incident response.
Provide hands on leadership to team leads within the area, setting clear goals and fostering alignment and collaboration across functions.
Ensure our cloud infrastructure (AWS) is scalable, secure, and cost efficient, aligned with current and future business needs.
Guide platform architecture decisions and cloud native best practices, with an emphasis on automation, observability, and developer experience.
Oversee the Incident and Problem Management lifecycle, ensuring rapid response, effective communication, and rigorous root cause analysis.
Lead platform reliability and resilience efforts, defining and monitoring SLAs and SLOs across all services.
Drive infrastructure capacity planning to meet evolving product and operational needs.
Monitor and manage cloud spend, ensuring cost optimization and ownership of the infrastructure budget.
Define and enforce information security policies and collaborate across teams to embed security best practices in infrastructure and operations.
Partner with Engineering and Product teams to ensure platform capabilities meet the needs of internal and external stakeholders.
Drive continuous improvement initiatives from infrastructure modernization to operational process enhancements.
Represent the Infrastructure & Reliability area in strategic planning discussions and company wide leadership forums.

About You

10+ years of experience managing major incidents in mission-critical or always-on environments.
Experience with iGaming and fintech platforms is required; familiarity with Web3 projects is a strong plus.
Proven ability to independently lead multiple incidents concurrently with minimal support.
Strong understanding of application development, system architectures, and cloud environments.
Familiarity with infrastructure concepts, including physical, virtual, and containerized compute platforms.
Practical experience with modern monitoring and telemetry tools such as Splunk, Prometheus, or Grafana.
Basic data analysis skills using SQL or similar tools.
Excellent task management and communication skills, with the ability to remain composed under pressure.
Experience handling diverse incident types such as technical, security, privacy, or crisis management.
Familiarity with distributed architectures and system interdependencies in a cloud environment.
Proven experience in managing public-facing communications, including status pages and social media updates during incidents.
A proactive, ownership-driven mindset with a commitment to continuous improvement in incident management processes.

Postar um currículo

Veja mais empregos em Lisboa, Lisboa