Skip to content

Conversation

@addaleax
Copy link
Member

@addaleaxaddaleax commented Dec 5, 2017

  • src: minor cleanups to node_url.cc

    • Remove pointless pointers
    • Make WriteHost take a const argument so that it’s functionality
      is clear from the signature
    • Make FindLongestZeroSequence templated to accommodate the constness
      in WriteHost and because using uint16_t is an articifial,
      unnecessary restriction
    • Remove string copying when no copies are needed
    • Make PercentDecode just return its return value
    • Make ParseHost (string-only version) take its constant argument
      as a constant reference
  • src: move url internals into anonymous namespace

    This helps because static doesn’t work for C++ classes,
    but refactoring url_host into a proper C++ class seems the
    most reasonable soluation for the memory leak fixed by the next commit.

  • src: make url host a proper C++ class

    • Gives URLHost a proper destructor that clears memory
      depending on the type of the host (This fixes a memory leak)
    • Hide the host type enums and class layout as implementation details
    • Make the Parse methods members of URLHost
    • Turn WriteHost into a ToString() method on the URLHost class
    • Verify that at the beginning of a parse attempt, the type is set
      to “failed”
    • Remove a lot of gotos from the source code 🐢🚀

    Fixes: Why use new URL() may lead to memory leak ? #17448

  • src: use correct OOB check for IPv6 parsing

    last_piece pointed to the end of the 8×16 bit array,
    so piece_pointer == last_piece already means that the pointer
    is not writable any longer.

    Previously, this still worked most of the time but could
    result in an out-of-bounds-write.

    Also, rename last_piece to buffer_end to avoid this pitfall.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

src/node_url.cc

Not sure how to write tests but it makes valgrind happy about @TimothyGu’s example from #17448

@addaleaxaddaleax added c++ Issues and PRs that require attention from people who are familiar with C++. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Dec 5, 2017
@nodejs-github-botnodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v4.x whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Dec 5, 2017
- Remove pointless pointers - Make `WriteHost` take a const argument so that it’s functionality is clear from the signature - Make `FindLongestZeroSequence` templated to accommodate the constness in `WriteHost` and because using `uint16_t` is an articifial, unnecessary restriction - Remove string copying when no copies are needed - Make `PercentDecode` just return its return value - Make `ParseHost` (string-only version) take its constant argument as a constant reference
This helps because `static` doesn’t work for C++ classes, but refactoring `url_host` into a proper C++ class seems the most reasonable soluation for the memory leak fixed by the next commit.
@addaleax
Copy link
MemberAuthor

@addaleax
Copy link
MemberAuthor

Some of the test failures are definitely related & a bit curious … going to look into those later

- Gives `URLHost` a proper destructor that clears memory depending on the type of the host (This fixes a memory leak) - Hide the host type enums and class layout as implementation details - Make the `Parse` methods members of `URLHost` - Turn `WriteHost` into a `ToString()` method on the `URLHost` class - Verify that at the beginning of a parse attempt, the type is set to “failed” - Remove a lot of `goto`s from the source code 🐢🚀 Fixes: nodejs#17448
`last_piece` pointed to the end of the 8×16 bit array, so `piece_pointer == last_piece` already means that the pointer is not writable any longer. Previously, this still worked most of the time but could result in an out-of-bounds-write. Also, rename `last_piece` to `buffer_end` to avoid this pitfall.
@addaleax
Copy link
MemberAuthor

Copy link
Member

@TimothyGuTimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid work! Good to see this whole thing properly C++-ized, and the memory leak fixed. LGTM.

while (ch != kEOL){
if (piece_pointer > last_piece)
goto end;
if (piece_pointer >= buffer_end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice catch!

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyGu well, it was breaking CI for me, soooo... ;)

Wouldn't have seen it without the tests breaking.

@addaleaxaddaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 8, 2017
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
- Remove pointless pointers - Make `WriteHost` take a const argument so that it’s functionality is clear from the signature - Make `FindLongestZeroSequence` templated to accommodate the constness in `WriteHost` and because using `uint16_t` is an articifial, unnecessary restriction - Remove string copying when no copies are needed - Make `PercentDecode` just return its return value - Make `ParseHost` (string-only version) take its constant argument as a constant reference PR-URL: nodejs#17470 Reviewed-By: Timothy Gu <[email protected]>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
This helps because `static` doesn’t work for C++ classes, but refactoring `url_host` into a proper C++ class seems the most reasonable soluation for the memory leak fixed by the next commit. PR-URL: nodejs#17470 Reviewed-By: Timothy Gu <[email protected]>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
- Gives `URLHost` a proper destructor that clears memory depending on the type of the host (This fixes a memory leak) - Hide the host type enums and class layout as implementation details - Make the `Parse` methods members of `URLHost` - Turn `WriteHost` into a `ToString()` method on the `URLHost` class - Verify that at the beginning of a parse attempt, the type is set to “failed” - Remove a lot of `goto`s from the source code 🐢🚀 PR-URL: nodejs#17470Fixes: nodejs#17448 Reviewed-By: Timothy Gu <[email protected]>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
`last_piece` pointed to the end of the 8×16 bit array, so `piece_pointer == last_piece` already means that the pointer is not writable any longer. Previously, this still worked most of the time but could result in an out-of-bounds-write. Also, rename `last_piece` to `buffer_end` to avoid this pitfall. PR-URL: nodejs#17470 Reviewed-By: Timothy Gu <[email protected]>
@BridgeAR
Copy link
Member

Landed in e8a5f7b, 9236dfe, 89b3746, 2050bab

@addaleaxaddaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 13, 2017
@addaleaxaddaleax deleted the url-refactor branch December 13, 2017 05:20
@MylesBorins
Copy link
Contributor

@TimothyGu does this need to be included in the backport?

@gibfahn
Copy link
Member

gibfahn commented Dec 20, 2017

@trygve-lie , in reply to #17774 (comment)

Normally we don't backport to LTS until things have been in a current release (9.x) for at least two weeks. This hasn't gone into a 9.x release yet, so we'd be talking 6 weeks before this could land in v8.x.

If this fixes a serious bug we should consider including it in the v8.9.4 release candidate build going out today, so that it goes out with v8.9.4 in two weeks.

What would be helpful is an idea of:

  1. How much pain this bug causes
  2. How many people are going to hit this
  3. How minor the fix is (how much risk is there in rushing it)

Thoughts @nodejs/lts ?

MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
- Remove pointless pointers - Make `WriteHost` take a const argument so that it’s functionality is clear from the signature - Make `FindLongestZeroSequence` templated to accommodate the constness in `WriteHost` and because using `uint16_t` is an articifial, unnecessary restriction - Remove string copying when no copies are needed - Make `PercentDecode` just return its return value - Make `ParseHost` (string-only version) take its constant argument as a constant reference Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
This helps because `static` doesn’t work for C++ classes, but refactoring `url_host` into a proper C++ class seems the most reasonable soluation for the memory leak fixed by the next commit. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
- Gives `URLHost` a proper destructor that clears memory depending on the type of the host (This fixes a memory leak) - Hide the host type enums and class layout as implementation details - Make the `Parse` methods members of `URLHost` - Turn `WriteHost` into a `ToString()` method on the `URLHost` class - Verify that at the beginning of a parse attempt, the type is set to “failed” - Remove a lot of `goto`s from the source code 🐢🚀 Backport-PR-URL: #18324 PR-URL: #17470Fixes: #17448 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
`last_piece` pointed to the end of the 8×16 bit array, so `piece_pointer == last_piece` already means that the pointer is not writable any longer. Previously, this still worked most of the time but could result in an out-of-bounds-write. Also, rename `last_piece` to `buffer_end` to avoid this pitfall. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
@MylesBorinsMylesBorins mentioned this pull request Feb 7, 2018
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
- Remove pointless pointers - Make `WriteHost` take a const argument so that it’s functionality is clear from the signature - Make `FindLongestZeroSequence` templated to accommodate the constness in `WriteHost` and because using `uint16_t` is an articifial, unnecessary restriction - Remove string copying when no copies are needed - Make `PercentDecode` just return its return value - Make `ParseHost` (string-only version) take its constant argument as a constant reference Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
This helps because `static` doesn’t work for C++ classes, but refactoring `url_host` into a proper C++ class seems the most reasonable soluation for the memory leak fixed by the next commit. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
- Gives `URLHost` a proper destructor that clears memory depending on the type of the host (This fixes a memory leak) - Hide the host type enums and class layout as implementation details - Make the `Parse` methods members of `URLHost` - Turn `WriteHost` into a `ToString()` method on the `URLHost` class - Verify that at the beginning of a parse attempt, the type is set to “failed” - Remove a lot of `goto`s from the source code 🐢🚀 Backport-PR-URL: #18324 PR-URL: #17470Fixes: #17448 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
`last_piece` pointed to the end of the 8×16 bit array, so `piece_pointer == last_piece` already means that the pointer is not writable any longer. Previously, this still worked most of the time but could result in an out-of-bounds-write. Also, rename `last_piece` to `buffer_end` to avoid this pitfall. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
- Remove pointless pointers - Make `WriteHost` take a const argument so that it’s functionality is clear from the signature - Make `FindLongestZeroSequence` templated to accommodate the constness in `WriteHost` and because using `uint16_t` is an articifial, unnecessary restriction - Remove string copying when no copies are needed - Make `PercentDecode` just return its return value - Make `ParseHost` (string-only version) take its constant argument as a constant reference Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
This helps because `static` doesn’t work for C++ classes, but refactoring `url_host` into a proper C++ class seems the most reasonable soluation for the memory leak fixed by the next commit. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
- Gives `URLHost` a proper destructor that clears memory depending on the type of the host (This fixes a memory leak) - Hide the host type enums and class layout as implementation details - Make the `Parse` methods members of `URLHost` - Turn `WriteHost` into a `ToString()` method on the `URLHost` class - Verify that at the beginning of a parse attempt, the type is set to “failed” - Remove a lot of `goto`s from the source code 🐢🚀 Backport-PR-URL: #18324 PR-URL: #17470Fixes: #17448 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
`last_piece` pointed to the end of the 8×16 bit array, so `piece_pointer == last_piece` already means that the pointer is not writable any longer. Previously, this still worked most of the time but could result in an out-of-bounds-write. Also, rename `last_piece` to `buffer_end` to avoid this pitfall. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
- Remove pointless pointers - Make `WriteHost` take a const argument so that it’s functionality is clear from the signature - Make `FindLongestZeroSequence` templated to accommodate the constness in `WriteHost` and because using `uint16_t` is an articifial, unnecessary restriction - Remove string copying when no copies are needed - Make `PercentDecode` just return its return value - Make `ParseHost` (string-only version) take its constant argument as a constant reference Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
This helps because `static` doesn’t work for C++ classes, but refactoring `url_host` into a proper C++ class seems the most reasonable soluation for the memory leak fixed by the next commit. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
- Gives `URLHost` a proper destructor that clears memory depending on the type of the host (This fixes a memory leak) - Hide the host type enums and class layout as implementation details - Make the `Parse` methods members of `URLHost` - Turn `WriteHost` into a `ToString()` method on the `URLHost` class - Verify that at the beginning of a parse attempt, the type is set to “failed” - Remove a lot of `goto`s from the source code 🐢🚀 Backport-PR-URL: #18324 PR-URL: #17470Fixes: #17448 Reviewed-By: Timothy Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
`last_piece` pointed to the end of the 8×16 bit array, so `piece_pointer == last_piece` already means that the pointer is not writable any longer. Previously, this still worked most of the time but could result in an out-of-bounds-write. Also, rename `last_piece` to `buffer_end` to avoid this pitfall. Backport-PR-URL: #18324 PR-URL: #17470 Reviewed-By: Timothy Gu <[email protected]>
@TimothyGuTimothyGu mentioned this pull request Feb 27, 2018
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++Issues and PRs that require attention from people who are familiar with C++.whatwg-urlIssues and PRs related to the WHATWG URL implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Why use new URL() may lead to memory leak ?

8 participants

@addaleax@BridgeAR@MylesBorins@gibfahn@trygve-lie@TimothyGu@joyeecheung@nodejs-github-bot