Skip to content

1-部署数据湖

DANGER

似乎存在错误,虽然在 DockerCompose 内部 NocoDB 连接到 postgres 启用 SSL 没问题。但为啥我在家中的内网环境下死活没法启用 SSL,我试图从公网域名连也无法启用 SSL ?是 GFW 干的,还是我的配置文件问题?

本文描述了如何部署 LibianDatalake 数据湖。

1. 创建环境变量文件

第一步,先创建好 .env 文件。它之后会被 shell 命令与 docker-compose.yml 读取。

WARNING

⚠️ 将以下 .env 文件中的值改为你自己的!并且不要泄漏。

POSTGRES_HOSTNAME=libian-datalake-postgres.yourhostname.com
POSTGRES_DB=libian-datalake
POSTGRES_USERNAME=postgres
POSTGRES_PASSWORD=xxxxxxxxxx
PGADMIN_MY_EMAIL=xxxxxxxxxx@gmail.com
PGADMIN_MY_PASSWORD=xxxxxxxxxx
NC_PUBLIC_URL=https://libian-datalake-nocodb.yourhostname.com
NC_ADMIN_EMAIL=xxxxxxxxxx@gmail.com
NC_ADMIN_PASSWORD=xxxxxxxxxx
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=465
SMTP_SSL_PY_BOOL=True
SMTP_SSL_JS_BOOL=true
SMTP_USERNAME=smtpxxxxsmtp
SMTP_PASSWORD=smtpxxxxsmtp
SMTP_SENDER=smtpxxxxsmtp@gmail.com
#
# The internal address( Schema + Hostname + Port )
# should be same as the public address.
#
# Because nocodb backend uses the internal address, 
# and the internal address will be exposed in the nocodb table data.
#
MINIOSNSD_HOSTNAME=libian-datalake-miniosnsd.yourhostname.com
# The public network port is the same as the port in the container, 
# (but the mapped port has nothing to do with it)
MINIOSNSD_BOTH_PORT=443
MINIO_ROOT_USER=myminioadmin
MINIO_ROOT_PASSWORD=minio-secret-key-change-me
MINIO_BROWSER_REDIRECT_URL=https://libian-datalake-minioconsole.yourhostname.com

然后使用以下命令读入并检查 .env 中的环境变量。

shell
export $(cat .env | xargs) && echo $PGADMIN_MY_EMAIL

2. 检查 SSL 证书和私钥

使用 TLS 的必要性

启用 TLS 加密传输,可以保护各个模块与 PostgreSQL 数据库之间的通信安全。

  • ⚠️ 否则,你的 PostgreSQL 的通信报文将在公网裸奔。

由于 MinIO Server 在 Docker Compose 的 内部网络 和 公网网络 中的协议(https)应当一致,因此此处需要在 MinIO 服务启动时提供 SSL 证书。

  • ⚠️ 如果在内网使用 http 而在公网使用 https,nocodb 将不能区分内网和公网,将会在数据表中保留 http:// 开头的 MinIO 链接,这会导致图片和附件的链接无效。

第一步,请先申请自己的 SSL证书,推荐使用通配符域名证书,以覆盖多个子域。例如,*.yourdomain.com 可以同时保护 libian-datalake-miniosnsd.yourdomain.comlibian-datalake-postgres.yourdomain.com 等多个服务。

第二步,将申请到的 fullchain.crt 和 private.key 文件复制到指定目录

shell
# 创建 ca 目录(如果不存在)
mkdir -p ./ca

# 复制证书文件
cp /path/to/your/fullchain.crt ./ca/server.crt
cp /path/to/your/private.key ./ca/server.key

第三步,确保 SSL 证书和密钥的权限配置正确,防止未经授权的访问:

按照以下步骤确保证书和密钥的权限配置正确

这个示例中的 ./ca/server.* 是通配符证书,如果你的证书不是通配符证书,则自行修改。

shell
chown 999:999 ./ca/server.crt && \
  chown 999:999 ./ca/server.key && \
  chmod 600 ./ca/server.crt && \
  chmod 600 ./ca/server.key && \
  openssl pkcs8 -topk8 -in ./ca/server.key -out ./ca/minio-private.key -nocrypt && \
  cp ./ca/server.crt ./ca/minio-public.crt && \
  chown 0:0 ./ca/minio-private.key && \
  chown 0:0 ./ca/minio-public.crt && \
  chmod 600 ./ca/minio-private.key && \
  chmod 600 ./ca/minio-public.crt && \
  ls -la ./ca

3. 创建 MinIO 配置文件

MinIO 的单节点单磁盘(SNSD, Single Node Single Drive)部署模式需要通过配置文件来指定运行环境参数。为此,我们需要创建并导入必要的环境变量。

根据 MinIO 官方文档,请按照以下步骤操作:

shell
export $(cat .env | xargs) && \
  echo "
# MINIO_ROOT_USER and MINIO_ROOT_PASSWORD sets the root account for the MinIO server.
# This user has unrestricted permissions to perform S3 and administrative API operations on any resource in the deployment.
# Omit to use the default values 'minioadmin:minioadmin'.
# MinIO recommends setting non-default values as a best practice, regardless of environment

MINIO_ROOT_USER=$MINIO_ROOT_USER
MINIO_ROOT_PASSWORD=$MINIO_ROOT_PASSWORD

# MINIO_VOLUMES sets the storage volume or path to use for the MinIO server.

MINIO_VOLUMES=\"/mnt/data\"
" > ./minio.config.env && \
  cat ./minio.config.env && \
  chmod 600 ./minio.config.env && \
  ls -la

4. 创建 pgadmin4 数据目录并设置用户权限

为了确保pgAdmin4容器能够正确运行并管理数据库,需要在启动容器之前,预先创建必要的存储卷目录,并设置正确的用户和权限。

为什么需要预先创建存储卷

pgAdmin4容器通常以非root用户身份运行(例如UID:5050),因此宿主机上的存储卷目录必须与容器内部的用户环境保持一致,以确保容器内的进程能够正确访问和修改这些目录下的文件。如果不预先设置正确的用户组和权限,可能会导致容器无法写入数据或出现其他权限相关的问题。

可参考 pgAdmin 官方文档 了解更多。

shell
export $(cat .env | xargs) && \
  echo $POSTGRES_HOSTNAME && \
  echo $PGADMIN_MY_EMAIL && \
  mkdir -p ./volume/pgadmin_data && \
  mkdir -p ./volume/pgadmin_config && \
  echo "{
    \"Servers\": {
        \"1\": {
            \"Name\": \"Datalake Postgres\",
            \"Group\": \"Libian\",
            \"Username\": \"$POSTGRES_USERNAME\",
            \"Host\": \"$POSTGRES_HOSTNAME\",
            \"Port\": 5432,
            \"SSLMode\": \"verify-full\",
            \"SSLRootCert\": \"system\",
            \"MaintenanceDB\": \"$POSTGRES_DB\",
            \"PassFile\": \"~/.pgpass\"
        }
    }
}" > ./volume/pgadmin_config/servers.json && \
  echo "$POSTGRES_HOSTNAME:5432:*:$POSTGRES_USERNAME:$POSTGRES_PASSWORD" > ./volume/pgadmin_config/pgpass && \
  echo "import logging

# Switch between server and desktop mode
SERVER_MODE = True

#Change pgAdmin config DB path
CONFIG_DATABASE_URI='postgresql://$POSTGRES_USERNAME:$POSTGRES_PASSWORD@$POSTGRES_HOSTNAME:5432/$POSTGRES_DB?application_name=libian-datalake-pgadmin-config&sslmode=verify-full&sslrootcert=system'

#Setup SMTP
MAIL_SERVER = '$SMTP_SERVER'
MAIL_PORT = $SMTP_PORT
MAIL_USE_SSL = $SMTP_SSL_PY_BOOL
MAIL_USERNAME = '$SMTP_USERNAME'
MAIL_PASSWORD = '$SMTP_PASSWORD'
SECURITY_EMAIL_SENDER = '$SMTP_SENDER'

# Change log level
CONSOLE_LOG_LEVEL = logging.INFO
FILE_LOG_LEVEL = logging.INFO

" > ./volume/pgadmin_config/config_local.py && \
  chown -R 5050:5050 ./volume/pgadmin_data && \
  chown -R 5050:5050 ./volume/pgadmin_config && \
  chown -R 5050:5050 ./volume/pgadmin_config/servers.json && \
  chown -R 5050:5050 ./volume/pgadmin_config/pgpass && \
  chown -R 5050:5050 ./volume/pgadmin_config/config_local.py && \
  cat ./volume/pgadmin_config/servers.json && \
  cat ./volume/pgadmin_config/pgpass && \
  cat ./volume/pgadmin_config/config_local.py && \
  chmod 644 ./volume/pgadmin_config/servers.json && \
  chmod 644 ./volume/pgadmin_config/config_local.py && \
  chmod 600 ./volume/pgadmin_config/pgpass && \
  ls -la volume/*

5. Docker Compose 部署

5.1 创建 docker-compose.yml 文件

第四步,创建 docker-compose.yml

你可以根据需要来决定是否启动 MinIO SNSD

如果你已经拥有自己的 MinIO 或 S3 存储,则可以使用自己的 MinIO 存储,并修改此文件内容。

shell
echo '
name: "libian-datalake"

x-env: &env
  GENERIC_TIMEZONE: Asia/Shanghai
  TZ: Asia/Shanghai

services:
  postgres-db:
    image: postgres:17
    restart: always
    hostname: ${POSTGRES_HOSTNAME}
    env_file:
      - .env
    ports:
      - "18191:5432"
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_USER: ${POSTGRES_USERNAME}
      PGSSLMODE: verify-full
      POSTGRES_HOST_AUTH_METHOD: md5
      <<: *env
    healthcheck:
      interval: 10s
      retries: 10
      test: "pg_isready -U \"$$POSTGRES_USER\" -d \"$$POSTGRES_DB\""
      timeout: 2s
    volumes:
      - ./volume/db_data:/var/lib/postgresql/data
      - ./ca/server.crt:/var/lib/postgresql/server.crt:ro
      - ./ca/server.key:/var/lib/postgresql/server.key:ro
    command:
      - "-c"
      - "ssl=on"
      - "-c"
      - "ssl_cert_file=/var/lib/postgresql/server.crt"
      - "-c"
      - "ssl_key_file=/var/lib/postgresql/server.key"
  miniosnsd:
    image: "minio/minio:latest"
    # The internal domain name and port should be 
    # the same as the public domain name and port.
    #
    # Because nocodb backend uses the internal port, 
    # and the internal port will be exposed in the nocodb table data.
    hostname: ${MINIOSNSD_HOSTNAME}
    restart: always
    env_file:
      - .env
    ports:
      - "18194:${MINIOSNSD_BOTH_PORT}"
      - "18195:9001"
    environment:
      MINIO_CONFIG_ENV_FILE: /etc/config.env
      MINIO_BROWSER_REDIRECT_URL: ${MINIO_BROWSER_REDIRECT_URL}
      <<: *env
    volumes:
      - "./minio.config.env:/etc/config.env"
      - "./volume/miniosnsd_data:/mnt/data"
      - "./ca/minio-private.key:/root/.minio/certs/private.key:ro"
      - "./ca/minio-public.crt:/root/.minio/certs/public.crt:ro"
    healthcheck:
      test: ["CMD", "curl", "-k", "--silent", "--fail", "https://localhost:9001"]
      interval: 1m
      timeout: 20s
      retries: 5
      start_period: 5m
      start_interval: 30s
    command:
      - "server"
      - "--address"
      - ":${MINIOSNSD_BOTH_PORT}"
      - "--console-address"
      - ":9001"
  nocodb:
    image: "nocodb/nocodb:latest"
    restart: always
    ports:
      - "18193:8080"
    env_file:
      - .env
    depends_on:
      postgres-db:
        condition: service_healthy
      miniosnsd:
        condition: service_healthy
    environment:
      NC_DB: "pg://${POSTGRES_HOSTNAME}:5432?u=${POSTGRES_USERNAME}&p=${POSTGRES_PASSWORD}&d=${POSTGRES_DB}&application_name=libian-datalake-nocodb&sslmode=verify-full"
      NC_PUBLIC_URL: ${NC_PUBLIC_URL}
      NC_ADMIN_EMAIL: ${NC_ADMIN_EMAIL}
      NC_ADMIN_PASSWORD: ${NC_ADMIN_PASSWORD}
      NC_INVITE_ONLY_SIGNUP: true
      NC_DISABLE_TELE: true
      NC_SMTP_FROM: ${SMTP_SENDER}
      NC_SMTP_HOST: ${SMTP_SERVER}
      NC_SMTP_PORT: ${SMTP_PORT}
      NC_SMTP_USERNAME: ${SMTP_USERNAME}
      NC_SMTP_PASSWORD: ${SMTP_PASSWORD}
      NC_SMTP_SECURE: ${SMTP_SSL_JS_BOOL}
      <<: *env
    volumes:
      - "./volume/nc_data:/usr/app/data"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:8080"]
      interval: 1m
      timeout: 20s
      retries: 5
      start_period: 5m
      start_interval: 30s
  pgadmin:
    env_file:
      - .env
    ports:
      - "18192:80"
    depends_on:
      nocodb:
        condition: service_healthy
    image: dpage/pgadmin4:9.1
    restart: always
    environment:
      PGADMIN_DEFAULT_EMAIL: ${PGADMIN_MY_EMAIL}
      PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_MY_PASSWORD}
      PGADMIN_CONFIG_WTF_CSRF_ENABLED: "False"
      PGADMIN_CONFIG_WTF_CSRF_CHECK_DEFAULT: "False"
      PGADMIN_CONFIG_ENHANCED_COOKIE_PROTECTION: "False"
      PGADMIN_CONFIG_MAX_LOGIN_ATTEMPTS: 15
      PGADMIN_CONFIG_CHECK_EMAIL_DELIVERABILITY: "True"
      <<: *env
    volumes:
      - "./volume/pgadmin_data:/var/lib/pgadmin"
      - "./volume/pgadmin_config/servers.json:/pgadmin4/servers.json"
      - "./volume/pgadmin_config/pgpass:/home/pgadmin/.pgpass"
      - "./volume/pgadmin_config/config_local.py:/pgadmin4/config_local.py"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:80"]
      interval: 1m
      timeout: 20s
      retries: 5
      start_period: 30m
      start_interval: 30s
' > docker-compose.yml
yml
name: "libian-datalake"

x-env: &env
  GENERIC_TIMEZONE: Asia/Shanghai
  TZ: Asia/Shanghai

services:
  postgres-db:
    image: postgres:17
    restart: always
    hostname: ${POSTGRES_HOSTNAME}
    env_file:
      - .env
    ports:
      - "18191:5432"
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_USER: ${POSTGRES_USERNAME}
      PGSSLMODE: verify-full
      POSTGRES_HOST_AUTH_METHOD: md5
      <<: *env
    healthcheck:
      interval: 10s
      retries: 10
      test: "pg_isready -U \"$$POSTGRES_USER\" -d \"$$POSTGRES_DB\""
      timeout: 2s
    volumes:
      - ./volume/db_data:/var/lib/postgresql/data
      - ./ca/server.crt:/var/lib/postgresql/server.crt:ro
      - ./ca/server.key:/var/lib/postgresql/server.key:ro
    command:
      - "-c"
      - "ssl=on"
      - "-c"
      - "ssl_cert_file=/var/lib/postgresql/server.crt"
      - "-c"
      - "ssl_key_file=/var/lib/postgresql/server.key"
  miniosnsd:
    image: "minio/minio:latest"
    # The internal domain name and port should be 
    # the same as the public domain name and port.
    #
    # Because nocodb backend uses the internal port, 
    # and the internal port will be exposed in the nocodb table data.
    hostname: ${MINIOSNSD_HOSTNAME}
    restart: always
    env_file:
      - .env
    ports:
      - "18194:${MINIOSNSD_BOTH_PORT}"
      - "18195:9001"
    environment:
      MINIO_CONFIG_ENV_FILE: /etc/config.env
      MINIO_BROWSER_REDIRECT_URL: ${MINIO_BROWSER_REDIRECT_URL}
      <<: *env
    volumes:
      - "./minio.config.env:/etc/config.env"
      - "./volume/miniosnsd_data:/mnt/data"
      - "./ca/minio-private.key:/root/.minio/certs/private.key:ro"
      - "./ca/minio-public.crt:/root/.minio/certs/public.crt:ro"
    healthcheck:
      test: ["CMD", "curl", "-k", "--silent", "--fail", "https://localhost:9001"]
      interval: 1m
      timeout: 20s
      retries: 5
      start_period: 5m
      start_interval: 30s
    command:
      - "server"
      - "--address"
      - ":${MINIOSNSD_BOTH_PORT}"
      - "--console-address"
      - ":9001"
  nocodb:
    image: "nocodb/nocodb:latest"
    restart: always
    ports:
      - "18193:8080"
    env_file:
      - .env
    depends_on:
      postgres-db:
        condition: service_healthy
      miniosnsd:
        condition: service_healthy
    environment:
      NC_DB: "pg://${POSTGRES_HOSTNAME}:5432?u=${POSTGRES_USERNAME}&p=${POSTGRES_PASSWORD}&d=${POSTGRES_DB}&application_name=libian-datalake-nocodb&sslmode=verify-full"
      NC_PUBLIC_URL: ${NC_PUBLIC_URL}
      NC_ADMIN_EMAIL: ${NC_ADMIN_EMAIL}
      NC_ADMIN_PASSWORD: ${NC_ADMIN_PASSWORD}
      NC_INVITE_ONLY_SIGNUP: true
      NC_DISABLE_TELE: true
      NC_SMTP_FROM: ${SMTP_SENDER}
      NC_SMTP_HOST: ${SMTP_SERVER}
      NC_SMTP_PORT: ${SMTP_PORT}
      NC_SMTP_USERNAME: ${SMTP_USERNAME}
      NC_SMTP_PASSWORD: ${SMTP_PASSWORD}
      NC_SMTP_SECURE: ${SMTP_SSL_JS_BOOL}
      <<: *env
    volumes:
      - "./volume/nc_data:/usr/app/data"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:8080"]
      interval: 1m
      timeout: 20s
      retries: 5
      start_period: 5m
      start_interval: 30s
  pgadmin:
    env_file:
      - .env
    ports:
      - "18192:80"
    depends_on:
      nocodb:
        condition: service_healthy
    image: dpage/pgadmin4:9.1
    restart: always
    environment:
      PGADMIN_DEFAULT_EMAIL: ${PGADMIN_MY_EMAIL}
      PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_MY_PASSWORD}
      PGADMIN_CONFIG_WTF_CSRF_ENABLED: "False"
      PGADMIN_CONFIG_WTF_CSRF_CHECK_DEFAULT: "False"
      PGADMIN_CONFIG_ENHANCED_COOKIE_PROTECTION: "False"
      PGADMIN_CONFIG_MAX_LOGIN_ATTEMPTS: 15
      PGADMIN_CONFIG_CHECK_EMAIL_DELIVERABILITY: "True"
      <<: *env
    volumes:
      - "./volume/pgadmin_data:/var/lib/pgadmin"
      - "./volume/pgadmin_config/servers.json:/pgadmin4/servers.json"
      - "./volume/pgadmin_config/pgpass:/home/pgadmin/.pgpass"
      - "./volume/pgadmin_config/config_local.py:/pgadmin4/config_local.py"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:80"]
      interval: 1m
      timeout: 20s
      retries: 5
      start_period: 30m
      start_interval: 30s

5.2 运行

运行此命令以部署:

shell
docker compose up
shell
docker compose up -d && docker compose logs -t -f -n 100

6. 手动初始化 NocoDB 和 MinIO 配置

注意事项

  • pgAdmin 安装时间较长:在 CPU 性能较低的服务器上,pgAdmin 的安装可能需要约 20 分钟,请耐心等待。
  • 高效利用时间:建议在等待 pgAdmin 安装完成的同时,先手动初始化 NocoDB 和 MinIO 的相关配置。

具体步骤

  1. NocoDB 配置

    • 打开 NocoDB 应用程序。
      • Integrations 栏目中,新增您的数据源连接。
      • 进入 Teams & Settings > Setup > Configure E-mail,测试并验证 SMTP 邮件服务的配置是否正确。
      • 在用户界面的右上角下拉菜单中,选择 Language 以设置您 preferred 的显示语言。
  2. MinIO 配置

    • 登录 MinIO 管理界面。
      • 创建所需的存储桶(Bucket)。
      • 生成新的访问密钥(Access Key)和秘密密钥(Secret Key)。
  3. NocoDB 中配置 MinIO 存储

    • 在 NocoDB 的 Integrations 栏目中,找到并选择 MinIO 作为存储服务。
    • 输入之前在 MinIO 中创建的存储桶名称、访问密钥和秘密密钥,完成存储服务的集成配置。

通过以上步骤,您可以在等待 pgAdmin 安装的同时,高效地完成 NocoDB 和 MinIO 的基础配置工作。

其他

NginX 反向代理 MinIO 配置文件参考

nginx
server {
    listen      443 ssl;
    listen      [::]:443 ssl;
    server_name libian-datalake-miniosnsd.yourhostname.com;
    ssl_certificate /etc/nginx/certificates/your/fullchain.cer;
    ssl_certificate_key /etc/nginx/certificates/your/cert.key;

    location / {
        proxy_pass https://localhost:18194/;

        proxy_ssl_server_name on;


        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Protocol $scheme;
        proxy_set_header X-Forwarded-Host $http_host;
        proxy_set_header X-NginX-Proxy true;

        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
        send_timeout 60s;
        proxy_headers_hash_max_size 2048;
        proxy_headers_hash_bucket_size 128;
    }
}
nginx
server {
    listen      443 ssl;
    listen      [::]:443 ssl;
    server_name libian-datalake-minioconsole.yourhostname.com;
    ssl_certificate /etc/nginx/certificates/your/fullchain.cer;
    ssl_certificate_key /etc/nginx/certificates/your/cert.key;

    location / {
        proxy_pass https://localhost:18195/;
    
        proxy_ssl_server_name on;
        proxy_http_version 1.1;
        
        proxy_set_header Host $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Protocol $scheme;
        proxy_set_header X-Forwarded-Host $http_host;
        proxy_set_header X-NginX-Proxy true;

        proxy_connect_timeout 600;
        proxy_send_timeout 600;
        proxy_read_timeout 600;
        send_timeout 600;
        proxy_headers_hash_max_size 2048;
        proxy_headers_hash_bucket_size 128;
    }
}