X Tutup
Skip to content

Add PUID/PGID support to fix internal valkey UID/GID handling#3105

Open
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-internal-valkey-uid-gid
Open

Add PUID/PGID support to fix internal valkey UID/GID handling#3105
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-internal-valkey-uid-gid

Conversation

Copy link
Contributor

Copilot AI commented Mar 10, 2026

The container has no mechanism to honor PUID/PGID environment variables. When run with a non-default UID (e.g., 568 on TrueNAS), /redis-data (owned by romm:romm 1000:1000, mode 755) is unwritable by the process, causing internal valkey to fail.

Changes

  • docker/Dockerfile: Add su-exec to the production-stage apk add — needed to drop privileges from root to the target UID/GID.
  • docker/init_scripts/docker-entrypoint.sh: When the entrypoint runs as root (Docker default), read PUID/PGID (default: 1000), fix directory ownership, then exec su-exec to the target user before the init script runs. All root-required setup (nginx template rendering, symlink fixup) runs first.
    • /redis-data ownership is fixed recursively (-R) to handle existing data files (e.g. dump.rdb from prior root runs).
    • /romm ownership is fixed non-recursively to avoid touching potentially large user-mounted volumes (library, assets, etc.).
  • examples/docker-compose.example.yml: Document PUID/PGID as commented-out env vars.

Usage

services:
  romm:
    environment:
      - PUID=568  # e.g. TrueNAS apps user
      - PGID=568

All child processes (valkey, gunicorn, nginx, rq workers) inherit the dropped UID/GID. The existing nginx root-vs-non-root logic in init already handles this path correctly (EUID != 0 branch).

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug] Internal valkey ignores UID:GID and always runs as 1000 or root</issue_title>
<issue_description>RomM version
4.6.1

Describe the bug
The redis container ignores when docker tries to pass a UID:GID variable and always runs as 1000 or root. This is a major security issue and a nuisance on heavily access managed systems like TrueNAS.

To Reproduce

docker-compose with a variable specifying and UID of 568

Expected behavior

All containers and processes run with the UID and GID of 568

Screenshots
If applicable, add screenshots to help explain your problem.

Server (please complete the following information):

  • OS: TrueNAS

Client (please complete the following information):

masOS Safari

Additional context

This is a know issue and documented in the TrueNAS specific documentation. Having to run as root is a security risk if exposing to the internet, and one I would rather not take. There must be some other way to handle this.
</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix internal valkey to respect UID:GID settings Add PUID/PGID support to fix internal valkey UID/GID handling Mar 10, 2026
@gantoine gantoine marked this pull request as ready for review March 10, 2026 02:55
@greptile-apps
Copy link

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds PUID/PGID environment-variable support so the container can drop from root to a user-specified UID/GID at startup — solving the TrueNAS/Unraid use-case where /redis-data would otherwise be unwritable by a non-default user. The implementation follows the standard Alpine pattern: install su-exec, run all root-required setup (nginx template rendering, symlink fixup) first, then exec su-exec uid:gid /init so every child process (valkey, gunicorn, nginx, rq workers) inherits the restricted identity.

Key observations:

  • Logic is sound: the privilege-drop happens after all root-only setup is complete and exec ensures no process ever goes back to root.
  • /redis-data gets recursive chown on every startup — for large datasets this can slow container restarts; skipping the chown when ownership is already correct would be more efficient.
  • No PUID/PGID integer validation — a non-numeric value produces a confusing error from chown/su-exec rather than an actionable message.
  • Supplementary groups are droppedsu-exec uid:gid sets supplementary groups to exactly [gid]; host users with additional groups needed for bind-mount access (e.g. a shared media group) will lose that access silently.
  • Non-recursive /romm chown is intentional and explained in the PR, but a startup warning when PUID ≠ 1000 would help operators who migrate an existing install and find their named volumes (resources, assets) are still owned by the old uid.

Confidence Score: 3/5

  • Functionally correct but has robustness gaps (missing input validation, unconditional recursive chown, no supplementary-group support) that could cause subtle failures in production.
  • The core privilege-drop flow is architecturally correct and follows established Alpine patterns. The gaps are not showstoppers for the primary use case (TrueNAS uid 568), but they can cause confusing failures for operators who pass invalid values or rely on supplementary groups, and the unconditional recursive chown is a performance hazard for large redis datasets.
  • docker/init_scripts/docker-entrypoint.sh — contains all the new runtime logic and all identified issues.

Important Files Changed

Filename Overview
docker/init_scripts/docker-entrypoint.sh Adds PUID/PGID privilege-drop logic using su-exec; key concerns are missing integer validation, no supplementary-group support, and a recursive chown of /redis-data that runs on every restart.
docker/Dockerfile Adds su-exec to the production-stage apk install list; minimal, correct change.
examples/docker-compose.example.yml Documents PUID/PGID as commented-out env vars in the example compose file; no functional change.

Sequence Diagram

sequenceDiagram
    participant Docker as Docker Runtime
    participant EP as docker-entrypoint.sh (root)
    participant SE as su-exec
    participant Init as /init (PUID:PGID)
    participant VK as valkey-server
    participant GU as gunicorn
    participant NX as nginx
    participant RQ as rq worker/scheduler

    Docker->>EP: exec /docker-entrypoint.sh /init
    EP->>EP: Load _FILE env vars
    EP->>EP: Set ROMM_BASE_PATH / ROMM_PORT defaults
    EP->>EP: Run 20-envsubst-on-templates.sh (as root)
    EP->>EP: Fix nginx symlinks (as root)
    EP->>EP: EUID == 0 → read PUID/PGID (default 1000)
    EP->>EP: chown PUID:PGID /romm (non-recursive)
    EP->>EP: chown -R PUID:PGID /redis-data
    EP->>SE: exec su-exec PUID:PGID /init
    SE->>Init: setuid/setgid → exec /init
    Init->>VK: start valkey-server --dir /redis-data
    Init->>GU: start gunicorn (EUID=PUID)
    Init->>NX: nginx (EUID≠0, no user directive)
    Init->>RQ: start rq worker + scheduler
Loading

Last reviewed commit: 8539f48

Comment on lines +71 to +72
PUID=${PUID:-1000}
PGID=${PGID:-1000}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing integer validation for PUID/PGID

PUID and PGID are used directly in chown and su-exec without being validated as integers. If a user accidentally sets PUID=my-user or PGID="", chown will fail with an unhelpful error and the container will exit without a clear diagnostic. Adding a numeric check up front gives the operator an actionable error message:

Suggested change
PUID=${PUID:-1000}
PGID=${PGID:-1000}
PUID=${PUID:-1000}
PGID=${PGID:-1000}
if ! [[ "${PUID}" =~ ^[0-9]+$ ]] || ! [[ "${PGID}" =~ ^[0-9]+$ ]]; then
echo "ERROR: PUID and PGID must be numeric (got PUID='${PUID}' PGID='${PGID}')" >&2
exit 1
fi

Comment on lines +80 to +81
chown "${PUID}:${PGID}" /romm
chown -R "${PUID}:${PGID}" /redis-data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating from default UID may leave volumes inaccessible

chown /romm only updates the top-level directory. Named Docker volumes previously initialised under uid 1000 (e.g. romm_resources mounted at /romm/resources) will still be owned by 1000 after a migration to a custom PUID. Any attempt by the application to write new resources or assets into those volumes will fail silently for the non-root user.

While the intentional non-recursive approach is reasonable for large user libraries, it would be worth emitting a warning when PUID differs from the default 1000 so operators know they may need to chown their existing volumes on the host:

Suggested change
chown "${PUID}:${PGID}" /romm
chown -R "${PUID}:${PGID}" /redis-data
chown "${PUID}:${PGID}" /romm
chown -R "${PUID}:${PGID}" /redis-data
if [[ "${PUID}" != "1000" || "${PGID}" != "1000" ]]; then
echo "WARNING: PUID/PGID differ from the default 1000. Existing named volumes" \
"(resources, assets, config) may still be owned by uid 1000 and could be" \
"unwritable. Re-chown those volumes on the host if you encounter permission errors." >&2
fi

chown -R "${PUID}:${PGID}" /redis-data

# Drop privileges to the target user and re-execute
exec su-exec "${PUID}:${PGID}" "$@"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supplementary groups are not preserved by su-exec

su-exec uid:gid cmd calls setuid/setgid and sets the supplementary group list to exactly [gid]. If the host user identified by PUID belongs to additional groups (e.g. a shared media group needed to access the game library bind-mount), those supplementary groups will not be present in the child process, and file-access errors can follow.

The lighter-weight su-exec has no --groups flag; if supplementary group support becomes a requirement the container would need to switch to gosu or create the proper /etc/group entry dynamically (e.g. addgroup -g "${PGID}" appgroup && adduser -u "${PUID}" -G appgroup -D appuser) before dropping privileges. Worth noting this limitation in the compose-file comment.

# accessible. Avoid -R for /romm to prevent touching user-mounted volumes
# (library, assets, resources, etc.) which may contain large collections.
chown "${PUID}:${PGID}" /romm
chown -R "${PUID}:${PGID}" /redis-data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive chown of /redis-data may cause slow startup with large RDB files

chown -R /redis-data re-owners every file recursively at every container start. With a large dump.rdb on spinning-disk storage this can add noticeable latency to each restart. A common mitigation is to skip the chown when ownership is already correct:

Suggested change
chown -R "${PUID}:${PGID}" /redis-data
if [[ "$(stat -c '%u:%g' /redis-data)" != "${PUID}:${PGID}" ]]; then
chown -R "${PUID}:${PGID}" /redis-data
fi

This still handles the "prior root-run left files owned by 1000" migration case while being a no-op on subsequent starts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Internal valkey ignores UID:GID and always runs as 1000 or root

2 participants

X Tutup