X Tutup
Skip to content

feat: add an endpoint to manually resume a coder task#21948

Merged
SasSwart merged 6 commits intomainfrom
jjs/internal-1262
Feb 12, 2026
Merged

feat: add an endpoint to manually resume a coder task#21948
SasSwart merged 6 commits intomainfrom
jjs/internal-1262

Conversation

@SasSwart
Copy link
Contributor

@SasSwart SasSwart commented Feb 5, 2026

Closes coder/internal#1262.

This PR adds:

  • the POST /api/experimental/tasks/{user}/{task}/resume endpoint
  • follows conventions from Tasks: Manual pause endpoint internal#1261
  • sets the build reason to task_resume
  • a task that is not paused (ie. is already running), cannot be resumed.

the endpoint is now functional. however, there are a few non-functional changes to still make. 1. The pause and resume endpoint are nearly identical. It might be good to deduplicate them. 2. The authz tests need to be generalized
@SasSwart SasSwart marked this pull request as ready for review February 9, 2026 14:27
@SasSwart
Copy link
Contributor Author

SasSwart commented Feb 9, 2026

@mafredri @DanielleMaywood I still plan to make improvements to the testing based on the feedback in #21889, and I’m actively working through the best approach. I don’t want that to block progress on the broader change, though. While I sort out the testing updates, could you please review this PR? Aside from the non-functional testing concerns, it’s otherwise ready for review.

Copy link
Contributor

@DanielleMaywood DanielleMaywood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks fine to me other than a few issues with the tests but they mirror the prior PR so I understand there is a plan to rectify them.

I'm happy with this change but delegating hitting the green button to @mafredri as I don't fully have enough context on the workings of pause/resume

Comment on lines +2820 to +2822
ctx := testutil.Context(t, testutil.WaitLong)
client := coderdtest.New(t, &coderdtest.Options{IncludeProvisionerDaemon: true})
user := coderdtest.CreateFirstUser(t, client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is a systemic issue already existing in our tests but we should define ctx just before first use (I'll only leave a comment on this one to save having multiple comments of the same thing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that you flagged this Danielle. And for context, premature ctx initialization leads to flakes due to test setup not being accounted for (coderdtest.New, among others).

Comment on lines +2953 to +2956
workspaceBuild := dbfake.WorkspaceBuild(t, db, database.WorkspaceTable{
OrganizationID: user.OrganizationID,
OwnerID: user.UserID,
}).Do()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use dbfake.TemplateVersion here? Not a blocker just felt weird seeing a WorkspaceBuild on a test called "No Workspace for Task" 😄

Comment on lines +2971 to +2995
t.Run("Workspace not found", func(t *testing.T) {
t.Parallel()

ctx := testutil.Context(t, testutil.WaitShort)
db, ps := dbtestutil.NewDB(t)
var workspaceID uuid.UUID
wrapped := aiTaskStoreWrapper{
Store: db,
getWorkspaceByID: func(ctx context.Context, id uuid.UUID) (database.Workspace, error) {
if id == workspaceID && id != uuid.Nil {
return database.Workspace{}, sql.ErrNoRows
}
return db.GetWorkspaceByID(ctx, id)
},
}
client := setupClient(t, wrapped, ps, nil)
user := coderdtest.CreateFirstUser(t, client)
task, workspaceIDValue := setupWorkspaceTask(t, db, user)
workspaceID = workspaceIDValue

_, err := client.ResumeTask(ctx, codersdk.Me, task.ID)
var apiErr *codersdk.Error
require.ErrorAs(t, err, &apiErr)
require.Equal(t, http.StatusNotFound, apiErr.StatusCode())
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this test.

Just re-read the prior PR and saw the discussion around this test style so ignore this

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the endpoint is fairly simple, works for the common case, and has good test coverage. I'm fine unblocking here.

I'd still like to see tickets tracking the follow-up work (test refactoring, and task state handling). Let me know if you want help creating those.

codersdk.ProvisionerJobStatus(job.JobStatus),
codersdk.WorkspaceTransition(latestBuild.Transition),
)
if workspaceStatus == codersdk.WorkspaceStatusRunning {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the state handling here needs to be re-thought a bit. I know postWorkspaceBuildsInternal has its own protections against creating a new build, but that's not the only signal we care about for tasks. We also want to help the user resolve state transition issues, whereas workspace running/status does not translate to task and may lead to confusion.

This can be a follow-up task, and some of it is applicable to the pause endpoint too.

Some context:

This error could mask some things about the task state and we're repeating a lot of the work already covered by task.Status. I think we should primarily handle it, and fall back to looking at the workspace job.

If task status is queued, initializing, or active. It means we can't resume (this covers job pending/running/completed and workspace queued/starting/running).

Then there's the gnarly task status error, which can mean any number of things. We could potentially make use of task.StatusDebug field to output more detail about what's wrong. I assume we'd still want workspace running to indicate that it needs to be paused before it can be (re-)resumed.

The unknown status also means something is wrong with the build. I'm unsure of the top of my head if canceling a build results in error or unknown.

Some questions to consider:

  • Which states allow resume (and consequently pause) transition?
  • How do we convey the resolution path to the user?
  • If it's an unhappy path, do we need to suggest the user needs to manually control the workspace?
  • Future proofing? Separation between task status and workspace status may become more relevant in the future.

Comment on lines +2820 to +2822
ctx := testutil.Context(t, testutil.WaitLong)
client := coderdtest.New(t, &coderdtest.Options{IncludeProvisionerDaemon: true})
user := coderdtest.CreateFirstUser(t, client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that you flagged this Danielle. And for context, premature ctx initialization leads to flakes due to test setup not being accounted for (coderdtest.New, among others).

@SasSwart
Copy link
Contributor Author

@maf I'm tracking follow-up work specifically related to the testing directly in the description of the follow-up PR: #21889. I can write up an issue specifically if you'd like, but I don't consider it necessary to write an issue specifically. This is part of coder/internal#1262 imo.

@SasSwart SasSwart merged commit 47b8ca9 into main Feb 12, 2026
54 checks passed
@SasSwart SasSwart deleted the jjs/internal-1262 branch February 12, 2026 07:59
@github-actions github-actions bot locked and limited conversation to collaborators Feb 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tasks: Resume endpoint

3 participants

X Tutup