Data Engineering/Hadoop

Docker Compose 로 Hadoop 클러스터와 Presto 엔진 구축하기

minjiwoo 2024. 6. 16. 16:59
728x90

 

 

기본이 되는 hadoop cluster docker-compose.yml 은 아래의 repository 를 참고하였으며, presto 엔진을 사용하기 위해 수정하였다. 

 

https://github.com/big-data-europe/docker-hadoop

 

GitHub - big-data-europe/docker-hadoop: Apache Hadoop docker image

Apache Hadoop docker image. Contribute to big-data-europe/docker-hadoop development by creating an account on GitHub.

github.com

 

docker-compose.yml 파일은 다음과 같다. 

version: "3"

services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
    container_name: namenode
    restart: always
    ports:
      - 9870:9870
      - 9000:9000
    volumes:
      - hadoop_namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env

  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode
    restart: always
    volumes:
      - hadoop_datanode:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env
  
  resourcemanager:
    image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
    container_name: resourcemanager
    restart: always
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
    env_file:
      - ./hadoop.env

  nodemanager1:
    image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
    container_name: nodemanager
    restart: always
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
    env_file:
      - ./hadoop.env
  
  historyserver:
    image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
    container_name: historyserver
    restart: always
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
    volumes:
      - hadoop_historyserver:/hadoop/yarn/timeline
    env_file:
      - ./hadoop.env

  prestocoordinator:
    image: prestodb/presto:latest
    container_name: prestocoordinator
    restart: always
    ports:
      - 8080:8080
    environment:
      PRESTO_CONFIG: |
        coordinator=true
        node-scheduler.include-coordinator=true
        http-server.http.port=8080
        query.max-memory=5GB
        query.max-memory-per-node=1GB
        discovery-server.enabled=true
        discovery.uri=http://prestocoordinator:8080
    volumes:
      - presto_coordinator:/var/presto/data
    depends_on:
      - namenode
      - datanode
      - resourcemanager
      - nodemanager1
      - historyserver

  prestoworker:
    image: prestodb/presto:latest
    container_name: prestoworker
    restart: always
    environment:
      PRESTO_CONFIG: |
        coordinator=false
        http-server.http.port=8080
        query.max-memory=5GB
        query.max-memory-per-node=1GB
        discovery.uri=http://prestocoordinator:8080
    volumes:
      - presto_worker:/var/presto/data
    depends_on:
      - prestocoordinator
  
  prestocli:
    image: prestodb/presto:latest
    container_name: prestocli
    

volumes:
  hadoop_namenode:
  hadoop_datanode:
  hadoop_historyserver:
  presto_coordinator:
  presto_worker:

Presto 가 제공하는 web ui 에 접속하기 위하여 localhost:8080 으로 접속했다. 

Presto ui

 

SQL Client 에서  system.runtime schema 에서 SHOW TABLES; 쿼리를 실행시켜보았다. 

SELECT 쿼리를 실행시켜보았다.

728x90