Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellambrendan-kellam commented Nov 11, 2025

This PR resolves a couple of issues:

  • During termination, we call worker.close() to terminate any in progress jobs. This takes a gracefulTimeout param (defaults 30 seconds). If there are any jobs still in progress after that duration, there is a bug in groupmq where the job's group lock will not be released (see [bug] Group lock is not released after graceful shutdown, resulting in stalling jobs Openpanel-dev/groupmq#8). This resulted in jobs (especially long running connection jobs) being left in PENDING or IN_PROGRESS indefinitely, and new jobs not being queued until the timeout expires. The current workaround is to explicitly free the locks when calling dispose.
  • Improved the graceful shutdown handling in index.ts

Summary by CodeRabbit

Release Notes

Bug Fixes

  • Implemented timeout handling for background jobs; timed-out jobs are now properly marked as failed with status updates and metrics.
  • Added job state validation to prevent execution of jobs in invalid states.

Improvements

  • Enhanced graceful shutdown process with comprehensive error handling and error tracking integration.
  • Improved lock management during shutdown to prevent deadlocks.
  • Extended process signal handling for more reliable termination.

@coderabbitai
Copy link

coderabbitaibot commented Nov 11, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces graceful shutdown handling for job queues, implements job state validation before execution, enriches job payloads with metadata, adds a reusable signal listener for shutdown orchestration, and updates method signatures to pass AbortSignal instead of AbortController instances.

Changes

Cohort / File(s)Summary
Graceful timeout and job validation
packages/backend/src/connectionManager.ts, packages/backend/src/repoIndexManager.ts
Added graceful-timeout event handlers that mark timed-out jobs as FAILED with status updates and metric emissions. Implemented job state validation in runJob to fetch and verify job status is PENDING or IN_PROGRESS before execution. Added manual Redis lock cleanup during shutdown to release group locks for in-progress jobs. Extended constructors to store Redis instance as private field for lock management.
Shutdown signal handling
packages/backend/src/index.ts
Refactored cleanup flow into a new listenToShutdownSignals function that sequentially disposes managers, disconnects services, and handles errors with Sentry. Replaced hard-coded cleanup with signal handler registration loop over SHUTDOWN_SIGNALS and added uncaught exception/rejection handlers.
Shutdown constants
packages/backend/src/constants.ts
Added GROUPMQ_WORKER_STOP_GRACEFUL_TIMEOUT_MS (5 seconds) for worker shutdown timeout and SHUTDOWN_SIGNALS array listing Node.js termination signals.
Parameter type refinement
packages/backend/src/repoCompileUtils.ts
Changed compileGithubConfig signature to accept signal: AbortSignal instead of abortController: AbortController, forwarding the signal to getGitHubReposFromConfig.

Sequence Diagram(s)

sequenceDiagram participant Worker participant JobQueue participant DB as Prisma/Redis participant Handler rect rgb(200, 220, 255) note over Worker: Job Lifecycle with Graceful Timeout end Worker->>Worker: runJob called Worker->>DB: Fetch current job status DB-->>Worker: Job status alt Status is PENDING or IN_PROGRESS Worker->>Worker: Execute job alt Normal Completion Worker->>Handler: Job success else Graceful Timeout Handler->>Worker: onJobGracefulTimeout Worker->>DB: Mark job as FAILED Worker->>Handler: Update metrics & log end else Status is other Worker->>Handler: Throw error (skip invalid state) end 
Loading
sequenceDiagram participant System participant Index as index.ts participant Managers as ConnectionManager<br/>RepoIndexManager participant Services as Prisma<br/>Redis<br/>API<br/>Posthog participant Signals rect rgb(255, 220, 200) note over System: Graceful Shutdown Flow end Signals->>Index: SIGTERM/SIGINT/etc. Index->>Index: listenToShutdownSignals Index->>Managers: dispose() each manager par Manager Cleanup Managers->>Managers: Worker.gracefulStop(timeout) Managers->>Services: Delete Redis lock keys Managers->>Managers: Log lock release end Index->>Services: Disconnect each service Services-->>Index: Disconnected Index->>Index: Exit gracefully Index->>Signals: Remove handlers 
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • Lock cleanup logic in connectionManager.ts and repoIndexManager.ts: Manual Redis key deletion during disposal—verify error handling is robust and doesn't prevent shutdown
  • Job state validation logic: Ensure the early-exit pattern for invalid job states doesn't suppress legitimate errors or cause jobs to be lost
  • Signal handler guard mechanism: Verify the guard against repeated signal processing works correctly and doesn't prevent graceful shutdown if triggered multiple times
  • Payload enrichment impact: Confirm job payload expansion (jobId, connectionId, connectionName, orgId) doesn't break existing job consumers or queue serialization
  • Parameter type change in repoCompileUtils.ts: Ensure all call sites pass AbortSignal correctly and the change is backward-compatible with consumers

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check nameStatusExplanation
Title check✅ PassedThe title accurately summarizes the main change: implementing graceful shutdown fixes for workers, which is the core objective of the pull request.
Docstring Coverage✅ PassedNo functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check✅ PassedCheck skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

This comment has been minimized.

@brendan-kellam
Copy link
ContributorAuthor

@coderabbitai review

@coderabbitai
Copy link

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18fad64 and d43f35b.

📒 Files selected for processing (5)
  • packages/backend/src/connectionManager.ts (9 hunks)
  • packages/backend/src/constants.ts (1 hunks)
  • packages/backend/src/index.ts (3 hunks)
  • packages/backend/src/repoCompileUtils.ts (1 hunks)
  • packages/backend/src/repoIndexManager.ts (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*

📄 CodeRabbit inference engine (.cursor/rules/style.mdc)

Filenames should always be camelCase. Exception: if there are filenames in the same directory with a format other than camelCase, use that format to keep things consistent.

Files:

  • packages/backend/src/constants.ts
  • packages/backend/src/repoCompileUtils.ts
  • packages/backend/src/index.ts
  • packages/backend/src/repoIndexManager.ts
  • packages/backend/src/connectionManager.ts

@brendan-kellambrendan-kellam merged commit 903d15a into mainNov 12, 2025
9 checks passed
@brendan-kellambrendan-kellam deleted the bkellam/groupmq_workaround branch November 12, 2025 04:12
@github-actionsgithub-actionsbot mentioned this pull request Nov 12, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@brendan-kellam