Skip to content

Conversation

@joyeecheung
Copy link
Member

@joyeecheungjoyeecheung commented Nov 30, 2023

src: return non-empty data in context data serializer

For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

src: make sure that memcpy-ed structs in snapshot have no padding

To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

src: add utilities to help debugging reproducibility of snapshots

  • Print offsets in blob serializer
  • Add a special node:generate_default_snapshot ID to generate
    the built-in snapshot.
  • Improve logging
  • Add a test to check the reproducibilty of the snapshot

Refs: nodejs/build#3043

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup
  • @nodejs/v8-update

@nodejs-github-botnodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Nov 30, 2023
@haraldh
Copy link

great work! keep it up!

@joyeecheungjoyeecheungforce-pushed the context-slot branch 2 times, most recently from 6df2ede to 8dc86b6CompareMarch 22, 2024 19:07
@joyeecheungjoyeecheung changed the title WIP: reproducible snapshotbootstrap: make snapshot reproducibleMar 22, 2024
@joyeecheungjoyeecheung marked this pull request as ready for review March 22, 2024 19:08
@joyeecheungjoyeecheung added request-ci Add this label to start a Jenkins CI on a PR. and removed request-ci Add this label to start a Jenkins CI on a PR. labels Mar 22, 2024
@joyeecheungjoyeecheungforce-pushed the context-slot branch 2 times, most recently from c92d201 to 2d911beCompareMarch 22, 2024 19:11
@joyeecheungjoyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 22, 2024
@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 22, 2024
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
MemberAuthor

Somehow this is reproducible locally (macOS + x64 & arm64), and in the CI, on FreeBSD and SmartOS, but not on others. There are still 3-4 words of differences in the snapshot on those platforms. I'll investigate.

@joyeecheungjoyeecheungforce-pushed the context-slot branch 2 times, most recently from 57b8d25 to f03a7baCompareMay 20, 2024 07:38
@joyeecheung
Copy link
MemberAuthor

Split the context data callback part to #53217 - at least that makes the context data slots serialized with some non-gibberish (empty values), otherwise V8 might run into issues serializing the gibberish pointer values.

@legendecas
Copy link
Member

legendecas commented May 31, 2024

Is this PR still useful after #53217 being splitted?

@joyeecheung
Copy link
MemberAuthor

joyeecheung commented May 31, 2024

Yes, in this PR are still some utilities for checking reproducibility and a test for that (but it's not currently passing because a few other bytes are still not reproducible, I will work on those bytes here).

@joyeecheung
Copy link
MemberAuthor

joyeecheung commented Jun 12, 2024

Some discoveries:

  1. We need to return non-empty data for pointer values in context data otherwise V8 would still serialize them verbatim (that was what this PR used to do, in src: use v8::(Des|S)erializeInternalFieldsCallback #53217 I changed it to return empty data and that added back some more variance)
  2. The MemoryMode in EmbedderTypeInfo and the EmbedderObjectType in InternalFieldInfoBase are padded. We need to get rid of the padding - the easiest way to do so is to just make the enums uint64_t (a harder way would be to implement custom serialization for all the wrapper structs but I feel that's an overkill for a couple of bytes of overhead for now).

After fixing the two on Linux I am getting reproducible snapshots. Will update the branch after I clean it up..

For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible.
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs.
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot
@joyeecheungjoyeecheung added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. labels Jun 14, 2024
@nodejs-github-botnodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Jun 14, 2024
nodejs-github-bot pushed a commit that referenced this pull request Jun 14, 2024
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
@nodejs-github-bot
Copy link
Collaborator

Landed in 922feb1...1872167

nodejs-github-bot pushed a commit that referenced this pull request Jun 14, 2024
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
nodejs-github-bot pushed a commit that referenced this pull request Jun 14, 2024
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
Copy link
Member

@legendecaslegendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

w.Debug("Write code_cache\n");
w.Debug("0x%x: Write CodeCacheInfo\n", w.sink.size());
written_total += w.WriteVector<builtins::CodeCacheInfo>(code_cache);
w.Debug("SnapshotData::ToBlob() Wrote %d bytes\n", written_total);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth check that DCHECK_EQ(written_total, w.sink.size()).

targos pushed a commit that referenced this pull request Jun 20, 2024
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
targos pushed a commit that referenced this pull request Jun 20, 2024
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
targos pushed a commit that referenced this pull request Jun 20, 2024
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
ebouye pushed a commit to ebouye/node that referenced this pull request Jun 20, 2024
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
ebouye pushed a commit to ebouye/node that referenced this pull request Jun 20, 2024
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
ebouye pushed a commit to ebouye/node that referenced this pull request Jun 20, 2024
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
@targostargos mentioned this pull request Jun 25, 2024
nodejs-github-bot pushed a commit that referenced this pull request Jul 6, 2024
…ding" This reverts commit 4e58cde. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
nodejs-github-bot pushed a commit that referenced this pull request Jul 6, 2024
To prevent padding from making the snapshot unreproducible, zero-initialize the data that are copied into the snapshot so that the padding copied are all zeros. This is better than enlarging the enums to align the fields since it doesn't make the snapshot bigger than necessary, and it removes the need of using static assertions to ensure alignment. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
aduh95 pushed a commit that referenced this pull request Jul 12, 2024
…ding" This reverts commit 4e58cde. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
aduh95 pushed a commit that referenced this pull request Jul 12, 2024
To prevent padding from making the snapshot unreproducible, zero-initialize the data that are copied into the snapshot so that the padding copied are all zeros. This is better than enlarging the enums to align the fields since it doesn't make the snapshot bigger than necessary, and it removes the need of using static assertions to ensure alignment. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
aduh95 pushed a commit that referenced this pull request Jul 16, 2024
…ding" This reverts commit 4e58cde. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
aduh95 pushed a commit that referenced this pull request Jul 16, 2024
To prevent padding from making the snapshot unreproducible, zero-initialize the data that are copied into the snapshot so that the padding copied are all zeros. This is better than enlarging the enums to align the fields since it doesn't make the snapshot bigger than necessary, and it removes the need of using static assertions to ensure alignment. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
@marco-ippolitomarco-ippolito added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Jul 19, 2024
marco-ippolito pushed a commit that referenced this pull request Aug 19, 2024
To prevent padding from making the snapshot unreproducible, zero-initialize the data that are copied into the snapshot so that the padding copied are all zeros. This is better than enlarging the enums to align the fields since it doesn't make the snapshot bigger than necessary, and it removes the need of using static assertions to ensure alignment. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
marco-ippolito pushed a commit that referenced this pull request Aug 19, 2024
To prevent padding from making the snapshot unreproducible, zero-initialize the data that are copied into the snapshot so that the padding copied are all zeros. This is better than enlarging the enums to align the fields since it doesn't make the snapshot bigger than necessary, and it removes the need of using static assertions to ensure alignment. PR-URL: #53563 Refs: #50983 Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: James M Snell <[email protected]>
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commit-queue-rebaseAdd this label to allow the Commit Queue to land a PR in several commits.dont-land-on-v20.xPRs that should not land on the v20.x-staging branch and should not be released in v20.x.lib / srcIssues and PRs related to general changes in the lib or src directory.needs-ciPRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

@joyeecheung@nodejs-github-bot@haraldh@legendecas@lemire@jasnell@marco-ippolito