Skip to content

Conversation

@anonrig
Copy link
Member

@anonriganonrig commented Dec 9, 2022

simdutf provides a faster way of providing utf8 operations with SIMD instructions. @nodejs/undici team was looking for a way to validate utf8 input, and this dependency can make it happen.

Edit: I'm proposing either exposing the following functionality through a new module (like node:encoding) or through util.types or buffer

  • validate_ascii(string)
  • validate_utf8(string)
  • count_utf8(string)

PS: simdutf supports more features, and depending on the need, it makes more sense to expose them through a new module, instead of util.types or buffer.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp

@nodejs-github-botnodejs-github-bot added build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. labels Dec 9, 2022
@anonriganonrig mentioned this pull request Dec 9, 2022
12 tasks
@anonriganonrigforce-pushed the deps/simdutf branch 2 times, most recently from 2c20c9a to 6daa546CompareDecember 9, 2022 21:09
@anonriganonrig changed the title dep: add simdutf dependencydeps: add simdutf dependencyDec 9, 2022
@KhafraDev
Copy link
Member

This would help speedup both ws and undici's WebSocket implementation (which is still WIP). When we receive a text frame or receive a close frame with a reason, we need to validate that the buffer contains valid utf-8.

There are a few ways of doing so currently: a js implementation by default in both undici and ws, and optionally a package such as utf-8-validate. Note that simdutf is many times faster than the c++ version of utf-8-validate in the benchmark above, and the js fallback version is the slowest.

Here is a PR from @lpinca that shows massive speedups when using simdutf: websockets/utf-8-validate#101. Considering how widespread usage of ws is, exposing a very fast ability to validate utf-8 would improve a ton of the ecosystem.

@anonriganonrigforce-pushed the deps/simdutf branch 2 times, most recently from 5027cae to e94ba5fCompareDecember 9, 2022 21:29
@richardlaurichardlau added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 9, 2022
@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 9, 2022
@nodejs-github-bot
Copy link
Collaborator

@anonriganonrigforce-pushed the deps/simdutf branch 2 times, most recently from bed88cc to 4269fafCompareDecember 9, 2022 23:26
@anonriganonrigforce-pushed the deps/simdutf branch 3 times, most recently from ced7ef2 to 5566c99CompareDecember 10, 2022 02:53
@anonriganonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
@nodejs-github-bot
Copy link
Collaborator

@anonriganonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 3, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS pushed a commit that referenced this pull request Jan 4, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
RafaelGSS pushed a commit that referenced this pull request Jan 4, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
RafaelGSS pushed a commit that referenced this pull request Jan 4, 2023
Co-authored-by: Daniel Lemire <[email protected]> PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS pushed a commit that referenced this pull request Jan 5, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
RafaelGSS pushed a commit that referenced this pull request Jan 5, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
RafaelGSS pushed a commit that referenced this pull request Jan 5, 2023
Co-authored-by: Daniel Lemire <[email protected]> PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
RafaelGSS added a commit that referenced this pull request Jan 5, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 5, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 6, 2023
Notable changes: buffer: * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 http: * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778 net * add autoSelectFamily global getter and setter (Paolo Insogna) #45777 os: * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895 util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46061
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
Co-authored-by: Daniel Lemire <[email protected]> PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
juanarbol added a commit that referenced this pull request Jan 28, 2023
Notable changes: * buffer * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 * deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 * upgrade npm to 9.1.3 (npm team) #45693 * util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: TBD
@juanarboljuanarbol mentioned this pull request Jan 28, 2023
juanarbol added a commit that referenced this pull request Jan 28, 2023
Notable changes: * buffer * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 * deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 * upgrade npm to 9.1.3 (npm team) #45693 * util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46396
juanarbol added a commit that referenced this pull request Jan 28, 2023
Notable changes: * buffer * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 * deps: * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803 * add simdutf dependency (Yagiz Nizipli) #45803 * upgrade npm to 9.1.3 (npm team) #45693 * util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46396
juanarbol added a commit that referenced this pull request Jan 30, 2023
Notable changes: * buffer * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 * deps: * add simdutf dependency (Yagiz Nizipli) #45803 * upgrade npm to 9.1.3 (npm team) #45693 * util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46396
juanarbol pushed a commit that referenced this pull request Jan 31, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
juanarbol pushed a commit that referenced this pull request Jan 31, 2023
PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
juanarbol pushed a commit that referenced this pull request Jan 31, 2023
Co-authored-by: Daniel Lemire <[email protected]> PR-URL: #45803 Reviewed-By: Robert Nagy <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Michael Dawson <[email protected]>
juanarbol added a commit that referenced this pull request Jan 31, 2023
Notable changes: * buffer * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947 * deps: * add simdutf dependency (Yagiz Nizipli) #45803 * upgrade npm to 9.1.3 (npm team) #45693 * util: * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803 PR-URL: #46396
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author readyPRs that have at least one approval, no pending requests for changes, and a CI started.buildIssues and PRs related to build files or the CI.commit-queue-rebaseAdd this label to allow the Commit Queue to land a PR in several commits.dependenciesPull requests that update a dependency file.needs-ciPRs that need a full CI run.notable-changePRs with changes that should be highlighted in changelogs.performanceIssues and PRs related to the performance of Node.js.review wantedPRs that need reviews.toolsIssues and PRs related to the tools directory.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

17 participants

@anonrig@nodejs-github-bot@KhafraDev@lpinca@ronag@mscdex@bnoordhuis@richardlau@jasnell@targos@lemire@addaleax@mhdawson@mcollina@Uzlopak@daeyeon@jclaudioandrade