Skip to content

Conversation

@pcasaretto
Copy link

@pcasarettopcasaretto commented Nov 27, 2025

Changes in v2

  • Changed file naming: From foo.lock.pid to foo-pid.lock to avoid
    collision with refs namespace (where refs/foo.lock.pid would be valid)

  • Replaced environment variable with config: Now uses core.lockfilePid
    configuration instead of GIT_LOCK_PID_INFO environment variable

  • Added per-component configuration: Similar to core.fsync, users can
    enable PID files for specific components:

    • none (default), all
    • index, config, refs, commit-graph, midx, shallow, gc, other
  • Caller-specified components: Each call site specifies which component it
    belongs to, giving users fine-grained control. Please help by reviewing the categories
    specially the catch all OTHER

  • Always read existing PID files: When displaying lock errors, existing
    PID files are read regardless of config setting, ensuring helpful diagnostics

  • PID-only for now: Not adding hostname for simplicity.
    Should we maybe make the file contents extensible?

CC: Taylor Blau [email protected]
cc: "D. Ben Knoble" [email protected]
cc: Torsten Bögershausen [email protected]
cc: Jeff King [email protected]
cc: "Paulo Casaretto (Shopify)" [email protected]
cc: Patrick Steinhardt [email protected]

Copy link

@hcmaATshopifyhcmaATshopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue for using pid instead of holder everywhere.

Pros of pid

  • Precise: The only payload in the file is the PID. Using “pid” makes it 100% clear and avoids ambiguity.
  • Conventional: Many tools use .pid files or similar conventions, so the meaning is familiar.
  • Unambiguous Code: Variables like pid_file or read_lock_pid() are explicitly about PIDs, not other info.

lockfile.c Outdated
returnNULL;

strbuf_addf(&holder_path, "%s%s", lock_path, LOCK_HOLDER_SUFFIX);
fd=open(holder_path.buf, O_WRONLY | O_CREAT | O_TRUNC, mode);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure mode is restrictive enough (match whatever the .lock has and is readable by the creator).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does use the same mode parameter as the lock file:

// In lock_file():lk->tempfile=create_tempfile_mode(filename.buf, mode); // lock filelk->pid_tempfile=create_lock_pid_file(filename.buf, mode); // PID file - same mode

lockfile.c Outdated

if (lock_holder_info_enabled() &&
!read_lock_holder_pid(lock_path.buf, &holder_pid))
strbuf_addf(buf, _("Lock is held by process %"PRIuMAX".\n\n"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock (acquired via file '%s') is being held by process .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this will print after the "Unable to create '/path/to/.git/index.lock': File exists." message, I think it would be redundant.

lockfile.c Outdated
strbuf_addf(buf, _("Lock is held by process %"PRIuMAX".\n\n"),
holder_pid);

strbuf_addstr(buf, _("Another git process seems to be running in this repository, e.g.\n"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the feature is enabled, we can actually improve this now and be assertive (for instance, if the pid is no longer around, we can say this "lock file is stale and can be removed" or "lock file is valid and held by process "

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out kill was already implement and cross platform. Added a liveness check!

return0;
}

introllback_lock_file(structlock_file*lk)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend renaming rollback_lock_file to abort_lock_file for clarity and consistency with Git’s conventions. abort_lock_file is likely the best fit for the Git codebase, as this matches the standard “commit/abort” terminology for transactional semantics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not new, I just needed to change it from being an inline function to add new functionality.

lockfile.c Outdated
strbuf_addf(&holder_path, "%s%s", lock_path, LOCK_HOLDER_SUFFIX);
fd=open(holder_path.buf, O_WRONLY | O_CREAT | O_TRUNC, mode);
if (fd >= 0){
strbuf_addf(&content, "%"PRIuMAX"\n", (uintmax_t)getpid());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe strbuf_addf(&content, "%ld\n", (long)getpid()); since pid_t is generally int or long; cast to long for portability

Are we sure this will work for non-Unix platforms that might be supported by git? In other words, do we need to ifdef this whole thing so we don't break non-Unix builds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around the codebase it seems (uintmax_t)getpid() + PRIuMAX pattern is the established Git convention for portable PID handling:
Examples:

  1. daemon.c:1458 - writes PID to file:

    write_file(pid_file, "%"PRIuMAX, (uintmax_t) getpid());
  2. builtin/gc.c:789-790 - writes PID to lockfile:

    strbuf_addf(&sb, "%"PRIuMAX" %s", (uintmax_t) getpid(), my_host);

In terms of cross platform compatibility, I think I got lucky because it was already a first class citizen.
Did a little research and it works across the board.

  • getpid() is available on all Git-supported platforms (provided by libc on Unix, emulated on Windows)
  • pid_t is defined everywhere (native on Unix, typedef int on Windows via mingw-posix.h)

@hcmaATshopify
Copy link

Thanks for doing this @pcasaretto - LGTM!

@pcasarettopcasaretto marked this pull request as ready for review December 2, 2025 14:14
@pcasaretto
Copy link
Author

/preview

@gitgitgadget
Copy link

Preview email sent as [email protected]

@pcasaretto
Copy link
Author

/submit

@gitgitgadget
Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2011/pcasaretto/pid-holder-file-v1 

To fetch this version to local tag pr-2011/pcasaretto/pid-holder-file-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2011/pcasaretto/pid-holder-file-v1 

@gitgitgadget
Copy link

On the Git mailing list, "D. Ben Knoble" wrote (reply to this):

On Tue, Dec 2, 2025 at 10:07 AM Paulo Casaretto via GitGitGadget <[email protected]> wrote: >> From: Paulo Casaretto <[email protected]>>> When a lock file is held, it can be helpful to know which process owns> it, especially when debugging stale locks left behind by crashed> processes. Add an optional feature that creates a companion .lock.pid> file alongside each lock file, containing the PID of the lock holder.>> The .lock.pid file is created when a lock is acquired (if enabled), and> automatically cleaned up when the lock is released (via commit or> rollback). The file is registered as a tempfile so it gets cleaned up> by signal and atexit handlers if the process terminates abnormally.>> When a lock conflict occurs, the code checks if the PID from the .pid> file is still running using kill(pid, 0). This allows providing> context-aware error messages. With PID info enabled:>> Lock is held by process 12345. Wait for it to finish, or remove> the lock file to continue.>> Or for a stale lock:>> Lock was held by process 12345, which is no longer running.> Remove the stale lock file to continue.>> Without PID info (default):>> Another git process seems to be running in this repository.> Wait for it to finish, or remove the lock file to continue.>> The feature is opt-in via GIT_LOCK_PID_INFO=1 environment variable.>> Signed-off-by: Paulo Casaretto <[email protected]> Sounds interesting. I think by the time I wish I knew what else was using the lockfile, it's too late for me to alter my environment. Perhaps (in addition to allowing the environment opt-in) we could opt-in via configuration? Or is this really only useful, say, on the server side where the environment is carefully controlled? I don't relish putting this variable into my environment to take advantage of something that looks very useful. Are there downsides that make it necessary to be opt-in? I also imagine this could be a useful default; occasionally folks at work hit something similar and ask "what's up with that?" Only other thing is: just because a process X is running doesn't mean it was the one holding the lock, right? Since PIDs can be reused. --  D. Ben Knoble

@gitgitgadget
Copy link

User "D. Ben Knoble" <[email protected]> has been added to the cc: list.

@gitgitgadget
Copy link

On the Git mailing list, Torsten Bögershausen wrote (reply to this):

On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote: > From: Paulo Casaretto <[email protected]>>> When a lock file is held, it can be helpful to know which process owns> it, especially when debugging stale locks left behind by crashed> processes. Add an optional feature that creates a companion .lock.pid> file alongside each lock file, containing the PID of the lock holder.>> The .lock.pid file is created when a lock is acquired (if enabled), and> automatically cleaned up when the lock is released (via commit or> rollback). The file is registered as a tempfile so it gets cleaned up> by signal and atexit handlers if the process terminates abnormally.>> When a lock conflict occurs, the code checks if the PID from the .pid> file is still running using kill(pid, 0). This allows providing> context-aware error messages. With PID info enabled:>> Lock is held by process 12345. Wait for it to finish, or remove> the lock file to continue.>> Or for a stale lock:>> Lock was held by process 12345, which is no longer running.> Remove the stale lock file to continue.>> Without PID info (default):>> Another git process seems to be running in this repository.> Wait for it to finish, or remove the lock file to continue.>> The feature is opt-in via GIT_LOCK_PID_INFO=1 environment variable. [] I think that this makes sense. However, as a frequent user of Git repos hosted on an NFS server (without any problems in my setup): Does it make sense to add the hostname here ? We already have xgethostname() in Git, so that we can diagnose who/which machine really left a lock. []

@gitgitgadget
Copy link

User Torsten Bögershausen <[email protected]> has been added to the cc: list.

@gitgitgadget
Copy link

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote: > The .lock.pid file is created when a lock is acquired (if enabled), and> automatically cleaned up when the lock is released (via commit or> rollback). The file is registered as a tempfile so it gets cleaned up> by signal and atexit handlers if the process terminates abnormally. I'm sympathetic to the goal of this series, and the implementation looks cleanly done. But I wonder if there might be some system-level side effects that make these .pid files awkward. Temporarily having an extra .git/index.lock.pid file is probably not a big deal. But for other namespaces, like refs, we're colliding with names that have other meanings. So if we want to update refs/heads/foo, for example, we'll create refs/heads/foo.lock now. And after your patch, also refs/heads/foo.lock.pid. The ".lock" suffix is special, in that we disallow it in a refname and know to skip it when iterating over loose refs. But for the ".pid" variant, we run the risk of colliding with a real branch named "foo.lock.pid", both for reading and writing. On the writing side, creating foo.lock.pid may accidentally overwrite an existing branch. This can be mitigated by using O_EXCL when creating the pid file. But we can see the writes in the opposite order, which I think can also lead to data loss. Something like: - process A wants to write branch "foo", so it holds refs/heads/foo.lock and now also the matching foo.lock.pid - process B wants to write branch "foo.lock.pid", so it holds refs/heads/foo.lock.pid.lock (and the matching pid) - process B finishes its write and atomically renames foo.lock.pid.lock into foo.lock.pid. It's expected to overwrite the existing file there, so now process A's pid file is gone. - process A finishes its write and tries to clean up its pid file. So it deletes foo.lock.pid, which contains the actual data from process B's write. And now process B's write has been lost (and maybe even the entire existence of the foo.lock.pid branch, if it was not also present in the packed-refs file). On the reading side, anybody iterating refs/heads/ may racily see the foo.lock.pid file and think it is supposed to be an actual ref. So they open it, find it contains garbage, and then flag it as an error. I think both could be mitigated if we disallowed ".lock.pid" as a suffix in refnames, but that is a big user-facing change. In the long run, alternate ref stores like reftable won't suffer from this issue. It's possible there are other spots where we use lockfiles alongside arbitrarily-named files, though I couldn't think of any. So I dunno what that means for your patch. I notice that the user has to enable the feature manually. But it feels more like it should be selective based on which subsystem is using the lockfile (so refs would never want it, but other lockfiles/tempfiles might). -Peff

@gitgitgadget
Copy link

User Jeff King <[email protected]> has been added to the cc: list.

@gitgitgadget
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <[email protected]> writes: > So I dunno what that means for your patch. I notice that the user has to> enable the feature manually. But it feels more like it should be> selective based on which subsystem is using the lockfile (so refs would> never want it, but other lockfiles/tempfiles might). Or perhaps the way to opt into the feature is to create an empty file $GIT_DIR/lockfile-audit, and the lockfile subsystem will append to it every time a lock is taken? We need to ensure that a PID and pathname formatted into a single record is small enough and O_APPEND would relieve us from worrying about multi writer races, which may introduce different kind of complications, though. 

@gitgitgadget
Copy link

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, Dec 03, 2025 at 02:21:11PM -0800, Junio C Hamano wrote: > Jeff King <[email protected]> writes:>> > So I dunno what that means for your patch. I notice that the user has to> > enable the feature manually. But it feels more like it should be> > selective based on which subsystem is using the lockfile (so refs would> > never want it, but other lockfiles/tempfiles might).>> Or perhaps the way to opt into the feature is to create an empty> file $GIT_DIR/lockfile-audit, and the lockfile subsystem will append> to it every time a lock is taken? We need to ensure that a PID and> pathname formatted into a single record is small enough and O_APPEND> would relieve us from worrying about multi writer races, which may> introduce different kind of complications, though. I like a single log much better from a management perspective. I agree that atomicity is a potential issue, though. I think that even if we kept it small, network filesystems like NFS do not provide great guarantees for atomic appends. Something like flock() can work there, but that's not something we've relied on before. It also raises questions about reading (do we find pid files in the log in order to provide more directed advice?) and maintenance (do we ever clean it up, or just let it grow without bound?). -Peff

@gitgitgadget
Copy link

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Dec 03, 2025 at 04:16:10PM -0500, Jeff King wrote: > On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:>> > The .lock.pid file is created when a lock is acquired (if enabled), and> > automatically cleaned up when the lock is released (via commit or> > rollback). The file is registered as a tempfile so it gets cleaned up> > by signal and atexit handlers if the process terminates abnormally.>> I'm sympathetic to the goal of this series, and the implementation looks> cleanly done. But I wonder if there might be some system-level side> effects that make these .pid files awkward.>> Temporarily having an extra .git/index.lock.pid file is probably not a> big deal. But for other namespaces, like refs, we're colliding with> names that have other meanings. So if we want to update refs/heads/foo,> for example, we'll create refs/heads/foo.lock now. And after your patch,> also refs/heads/foo.lock.pid.>> The ".lock" suffix is special, in that we disallow it in a refname and> know to skip it when iterating over loose refs. But for the ".pid"> variant, we run the risk of colliding with a real branch named> "foo.lock.pid", both for reading and writing. Good point. I don't have a strong opinion on whether or not we should use an append-only log of which PIDs grabbed which lockfiles when versus tracking them on a per-lock basis. But I wonder if this would be mitigated by either: - Keeping the ".lock" suffix as-is, so that holding a lockfile at path "$GIT_DIR/index.lock" would create "$GIT_DIR/index-pid.lock" or something similar. - Introducing a new reference name constraint that treats ".lock.pid" as a reserved in a manner identical to how we currently treat ".lock". Between the two, I vastly prefer the former, but see below for more on why. > But we can see the writes in the opposite order, which I think can also> lead to data loss. Something like:>> - process A wants to write branch "foo", so it holds> refs/heads/foo.lock and now also the matching foo.lock.pid>> - process B wants to write branch "foo.lock.pid", so it holds> refs/heads/foo.lock.pid.lock (and the matching pid) Changing the naming scheme as above would cause us to hold "foo.pid.lock" in addition to "foo.lock". That would allow process B here to write branch "foo.lock.pid" (as is the case today). But if the scenario were instead "process B wants to write branch foo.pid.lock", it would fail immediately since the ".lock" suffix is reserved. > I think both could be mitigated if we disallowed ".lock.pid" as a suffix> in refnames, but that is a big user-facing change. Yeah, I don't think that we should change the refname constraints here, especially in a world where reftable deployments are more common. In that world I think we should err on the side of removing constraints, not adding them ;-). > So I dunno what that means for your patch. I notice that the user has to> enable the feature manually. But it feels more like it should be> selective based on which subsystem is using the lockfile (so refs would> never want it, but other lockfiles/tempfiles might). Yeah, I think that something similar to the "which files do we fsync() and how?" configuration we have today would be a nice complement here. (As an aside, I wonder if that interface, too, could be slightly improved. Right now we have a comma-separated list of values in the "core.fsync" configuration for listing different "components", and then a global core.fsyncMethod to either issue a fsync(), or a pagecache writeback, or writeout-only flushes in batches. It might be nice to have something like: [fsync "loose-object"] method = fsync [fsync "packfile"] method = writeout , so the analog here would be something like: [lockfile "refs"] pidfile = false [lockfile "index"] pidfile = true or similar. That could also be represented as core.lockfile=index, omitting "refs" to avoid tracking it. It may be that people don't really care to ever use different fsync methods for different fsync-able components, so perhaps the analogy doesn't hold up perfectly.) Thanks, Taylor

@gitgitgadget
Copy link

On the Git mailing list, Taylor Blau wrote (reply to this):

Hi Paulo, On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote: > The .lock.pid file is created when a lock is acquired (if enabled), and> automatically cleaned up when the lock is released (via commit or> rollback). The file is registered as a tempfile so it gets cleaned up> by signal and atexit handlers if the process terminates abnormally. (For the purposes of this review, I'll ignore the naming conventions that are discussed elsewhere in the thread, which I think can be resolved separately of any technical concerns.) > diff --git a/Documentation/git.adoc b/Documentation/git.adoc> index ce099e78b8..6fdd509d34 100644> --- a/Documentation/git.adoc> +++ b/Documentation/git.adoc> @@ -1010,6 +1010,16 @@ be in the future).> the background which do not want to cause lock contention with> other operations on the repository. Defaults to `1`.>> +`GIT_LOCK_PID_INFO`::> + If this Boolean environment variable is set to `1`, Git will create> + a `.lock.pid` file alongside each lock file containing the PID of the> + process that created the lock. This information is displayed in error> + messages when a lock conflict occurs, making it easier to identify> + stale locks or debug locking issues. The PID files are automatically> + cleaned up via signal and atexit handlers; however, if a process is> + terminated abnormally (e.g., SIGKILL), the file may remain as a stale> + indicator. Disabled by default. Regardless of whether or not we expose this functionality behind an environment variable or configuration, I think it would be nice to be able to turn PID tracking on and off for different components (e.g., for scenarios where you care about who is holding open, say, $GIT_DIR/index, but not who is creating a lock during ref creation). If we determined this through an environment variable, I think it would be reasonable to adopt the convention from the "core.fsync" configuration and use a comma-separated list. Alternatively, we could adopt that same convention for a configuration variable, say, "core.lockfile". But I think that having this exposed as a per-component setting via configuration is preferable than a global switch, since callers don't have to remember to set this variable in their environment to get the desired effect. Callers that want to opt-in or out of this feature on a one-off basis can still override the configuration via the top-level "-c" flag. > `GIT_REDIRECT_STDIN`::> `GIT_REDIRECT_STDOUT`::> `GIT_REDIRECT_STDERR`::> diff --git a/lockfile.c b/lockfile.c> index 1d5ed01682..4a694b9c3d 100644> --- a/lockfile.c> +++ b/lockfile.c> @@ -6,6 +6,9 @@> #include "abspath.h"> #include "gettext.h"> #include "lockfile.h"> +#include "parse.h"> +#include "strbuf.h"> +#include "wrapper.h">> /*> * path = absolute or relative path name> @@ -71,6 +74,62 @@ static void resolve_symlink(struct strbuf *path)> strbuf_reset(&link);> }>> +/*> + * Lock PID file functions - write PID to a .lock.pid file alongside> + * the lock file for debugging stale locks. The PID file is registered> + * as a tempfile so it gets cleaned up by signal/atexit handlers.> + */> +> +static int lock_pid_info_enabled(void)> +{ With the above suggestion, I think this function would get a little more complicated to instead take a component argument. That's not a huge deal in and of itself, but callers that *create* a lockfile will have to somehow pass in the component name when acquiring the lock. > + return git_env_bool(GIT_LOCK_PID_INFO_ENVIRONMENT, 0);> +}> +> +static struct tempfile *create_lock_pid_file(const char *lock_path, int mode)> +{> + struct strbuf pid_path = STRBUF_INIT;> + struct strbuf content = STRBUF_INIT;> + struct tempfile *pid_tempfile = NULL;> + int fd;> +> + if (!lock_pid_info_enabled())> + return NULL;> +> + strbuf_addf(&pid_path, "%s%s", lock_path, LOCK_PID_SUFFIX);> + fd = open(pid_path.buf, O_WRONLY | O_CREAT | O_TRUNC, mode); I'm not sure whether or not we should pass O_EXCL here, but I think it depends on the naming convention we pick. > + if (fd >= 0){ This is a style preference, but I'd suggest handling the happy path outside of this conditional if possible by inverting it. Perhaps something like: if (fd < 0) goto out; strbuf_addf(&content, ...); if (write_in_full(fd, content.buf, content.len) < 0) warning_errno(...); close(fd); pid_tempfile = register_tempfile(pid_path.buf); out: strbuf_release(...); return pid_tempfile; > +static int read_lock_pid(const char *lock_path, uintmax_t *pid_out)> +{> + struct strbuf pid_path = STRBUF_INIT;> + struct strbuf content = STRBUF_INIT;> + int ret = -1;> +> + strbuf_addf(&pid_path, "%s%s", lock_path, LOCK_PID_SUFFIX);> + if (strbuf_read_file(&content, pid_path.buf, LOCK_PID_MAXLEN) > 0){> + char *endptr;> + *pid_out = strtoumax(content.buf, &endptr, 10);> + if (*pid_out > 0 && (*endptr == '\n' || *endptr == '\0'))> + ret = 0; Same note here. I'd suggest setting "ret = 0" during initialization, and inverting this conditional to: if (*pid_out <= 0 || (*endptr != '\n' && *endptr != '\0')){warning(...); ret = -1; goto out} > + else> + warning(_("malformed lock pid file '%s'"), pid_path.buf);> + }> + strbuf_release(&pid_path);> + strbuf_release(&content);> + return ret;> +}> +> /* Make sure errno contains a meaningful value on error */> static int lock_file(struct lock_file *lk, const char *path, int flags,> int mode)> @@ -80,9 +139,12 @@ static int lock_file(struct lock_file *lk, const char *path, int flags,> strbuf_addstr(&filename, path);> if (!(flags & LOCK_NO_DEREF))> resolve_symlink(&filename);> -> strbuf_addstr(&filename, LOCK_SUFFIX);> + These look like stray whitespace changes that were left behind from development. > + if (pid_status == 1)> + strbuf_addf(buf, _("Lock is held by process %"PRIuMAX". "> + "Wait for it to finish, or remove the lock file to continue"),> + pid);> + else if (pid_status == -1)> + strbuf_addf(buf, _("Lock was held by process %"PRIuMAX", "> + "which is no longer running. Remove the stale lock file to continue"),> + pid);> + else> + strbuf_addstr(buf, _("Another git process seems to be running in this repository. "> + "Wait for it to finish, or remove the lock file to continue")); On one hand I prefer the new "Another git process" message for when we don't have a PID lockfile. But on the other hand, I think the "If it still fails, a git process may have crashed..." message is useful for users who may not be immediately aware of the consequences of simply removing the lockfile to continue. I do think the original message is somewhat verbose, so maybe the change here in the non-PID case is worth doing. What are your thoughts? > +> + strbuf_release(&lock_path);> } else> strbuf_addf(buf, _("Unable to create '%s.lock': %s"),> absolute_path(path), strerror(err));> @@ -207,6 +292,8 @@ int commit_lock_file(struct lock_file *lk)>{> char *result_path = get_locked_file_path(lk);>> + delete_tempfile(&lk->pid_tempfile);> + Do we want to wait to delete the PID file until after we know that we successfully committed the lockfile? > +/* Maximum length for PID file content */> +#define LOCK_PID_MAXLEN 32 Makes sense ;-). This should be plenty of space, since on my system maximum PID value is 2^22: $ cat /proc/sys/kernel/pid_max 4194304 > +test_expect_success 'PID info file cleaned up on successful operation when enabled' '> + git init repo4 &&> + (> + cd repo4 &&> + echo content >file &&> + env GIT_LOCK_PID_INFO=1 git add file &&> + # After successful add, no lock or PID files should exist> + ! test -f .git/index.lock &&> + ! test -f .git/index.lock.pid These should be: test_path_is_missing .git/index.lock && test_path_is_missing .git/index.lock.pid instead of bare "! test -f"'s. Thanks, Taylor

When a lock file is held, it can be helpful to know which process owns it, especially when debugging stale locks left behind by crashed processes. Add an optional feature that creates a companion PID file alongside each lock file, containing the PID of the lock holder. For a lock file "foo.lock", the PID file is named "foo-pid.lock". The PID file is created when a lock is acquired (if enabled), and automatically cleaned up when the lock is released (via commit or rollback). The file is registered as a tempfile so it gets cleaned up by signal and atexit handlers if the process terminates abnormally. When a lock conflict occurs, the code checks for an existing PID file and, if found, uses kill(pid, 0) to determine if the process is still running. This allows providing context-aware error messages: Lock is held by process 12345. Wait for it to finish, or remove the lock file to continue. Or for a stale lock: Lock was held by process 12345, which is no longer running. Remove the stale lock file to continue. The feature is controlled via core.lockfilePid configuration, which accepts per-component values similar to core.fsync: - none: Disable for all components (default) - all: Enable for all components - index, config, refs, commit-graph, midx, shallow, gc, other: Enable for specific components Multiple components can be combined with commas: git config core.lockfilePid index,config Each lock file call site specifies which component it belongs to, allowing users fine-grained control over which locks create PID files. Existing PID files are always read when displaying lock errors, regardless of the core.lockfilePid setting. This ensures helpful diagnostics even when the feature was previously enabled and later disabled. Signed-off-by: Paulo Casaretto <[email protected]>
@pcasaretto
Copy link
Author

/preview

@gitgitgadget
Copy link

Preview email sent as [email protected]

@pcasaretto
Copy link
Author

/submit

@gitgitgadget
Copy link

Error: 7068ec0 was already submitted

@gitgitgadget
Copy link

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Wed, Dec 03, 2025 at 06:19:46PM -0500, Taylor Blau wrote: > On Wed, Dec 03, 2025 at 04:16:10PM -0500, Jeff King wrote:> > On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:> >> > > The .lock.pid file is created when a lock is acquired (if enabled), and> > > automatically cleaned up when the lock is released (via commit or> > > rollback). The file is registered as a tempfile so it gets cleaned up> > > by signal and atexit handlers if the process terminates abnormally.> >> > I'm sympathetic to the goal of this series, and the implementation looks> > cleanly done. But I wonder if there might be some system-level side> > effects that make these .pid files awkward.> >> > Temporarily having an extra .git/index.lock.pid file is probably not a> > big deal. But for other namespaces, like refs, we're colliding with> > names that have other meanings. So if we want to update refs/heads/foo,> > for example, we'll create refs/heads/foo.lock now. And after your patch,> > also refs/heads/foo.lock.pid.> >> > The ".lock" suffix is special, in that we disallow it in a refname and> > know to skip it when iterating over loose refs. But for the ".pid"> > variant, we run the risk of colliding with a real branch named> > "foo.lock.pid", both for reading and writing.>> Good point. I don't have a strong opinion on whether or not we should> use an append-only log of which PIDs grabbed which lockfiles when versus> tracking them on a per-lock basis. But I wonder if this would be> mitigated by either:>> - Keeping the ".lock" suffix as-is, so that holding a lockfile at path> "$GIT_DIR/index.lock" would create "$GIT_DIR/index-pid.lock" or> something similar.>> - Introducing a new reference name constraint that treats ".lock.pid"> as a reserved in a manner identical to how we currently treat> ".lock".>> Between the two, I vastly prefer the former, but see below for more on> why. Yeah, I agree that this is not really a feasible change for the "files" reference backend. Besides all the issues mentioned in this thread, we also have to consider alternative implementations of Git and old versions. Those wouldn't know that the ".lock.pid" files are special, so they will misbehave if Git started to write them now. We could of course work around that issue by introducing a new repository extension. But I doubt that this is something we want to pursue in this context. The provided benefit just isn't high enough. The overall idea still has merit though. So if we still want to pursue it we can likely work around the issue by introducing a toggle that allows specific callers to opt out of creating the PID files. That'd raise the question though when we are most likely to need the PID files for debugging stuff. From my own experience I only ever had issues with stale locks in the ref subsystem, so if we disable the mechanism in exactly that subsystem it may be way less useful. If references are the main culprit one could also think about a slightly ugly alternative approach: loose refs handle it just fine if they contain additional lines. So in theory, we could write a second line for each lock file that contains the PID. We do have an fsck check that warns about this, but in theory this should just work. Whether we want to go there is a different question though. Patrick

@gitgitgadget
Copy link

User Patrick Steinhardt <[email protected]> has been added to the cc: list.

@gitgitgadget
Copy link

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, Dec 03, 2025 at 06:19:46PM -0500, Taylor Blau wrote: > Changing the naming scheme as above would cause us to hold> "foo.pid.lock" in addition to "foo.lock". That would allow process B> here to write branch "foo.lock.pid" (as is the case today). But if the> scenario were instead "process B wants to write branch foo.pid.lock", it> would fail immediately since the ".lock" suffix is reserved. I agree that this gets rid of any corruption or race issues, since all versions understand how to handle ".lock" specially (and assuming we create foo.pid.lock with O_EXCL). But there is still a namespace collision; I cannot write refs "foo" and "foo.pid" at the same time. > > So I dunno what that means for your patch. I notice that the user has to> > enable the feature manually. But it feels more like it should be> > selective based on which subsystem is using the lockfile (so refs would> > never want it, but other lockfiles/tempfiles might).>> Yeah, I think that something similar to the "which files do we fsync()> and how?" configuration we have today would be a nice complement here. I'm not sure it makes sense for the user to configure this. I more meant that the ref files-backend code would set a flag for "no, do not create a pid file for me ever" (or inversely, other bits of the code would add a flag for "yes, it's OK to do so"). -Peff

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@pcasaretto@hcmaATshopify