Fix incorrect test set values in leave_k_out splits with sparse user rows#640
Uh oh!
There was an error while loading. Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes#639
This PR fixes a bug in the evaluation of the
leave_k_out_splitin which the produced test matrix would contain values that were many multiples of their original value. Tests are also added on static (non-random) matrices that otherwise fail in the un-corrected implementation.This bug resulted from a calculation that required an input array with sequential values - the fact that non-sequential values were provided led to an error in processing.
Specifically, the
arrargument in _take_tailswas being provided as
candidate_users, from which user indices falling below the threshold were removed, resulting in a list in which the ordered set of unique integers was not consecutive and therefore the provided array was invalid.