Skip to content

Conversation

@wz1000
Copy link
Collaborator

@wz1000wz1000 commented Sep 28, 2022

Remove GetDependencyInformation in favour of GetModuleGraph

Computing and storing GetDependencyInformation for each file essentially individually means
that we perform downsweep on each file individually, wasting a lot of work and using an excessive
amount of memory to store all these duplicated graphs individually.

However, we already have the GetModuleGraph rule, which we need to compute before compiling
files any way due to being depended on by NeedsCompilation, which needs to know if any reverse
dependencies of the module we are compiling requires TH, which meant that each file already depends on
the results of downsweep for the whole project.

Instead, we can compute the whole graph once when we execute the GetModuleGraph rule and even use this inside HscEnv.hsc_mod_graph to avoid reconstructing the ModuleGraph on each invocation of GhcSessionDeps.

There may be concerns about excessive build churn due to any change to the result of GetModuleGraph
invalidating the result of GhcSessionDeps too often, but note that this only happens when something
in the header of a module changes, and this could be solved easily be re-introducing
a version of GetDependencyInformation with early cutoff that essentially returns the result of GetModuleGraph
but includes the hash of only the ModSummarys in the downward dependency closure of the file.

@pepeiborra
Copy link
Collaborator

pepeiborra commented Sep 28, 2022

How do you deal with the cycles issue? If there are cycles in the module graph, then you end up with cycles in the build graph, and hls-graph cannot handle them.

This has been discussed before: #1359, #2323 (comment)

@wz1000
Copy link
CollaboratorAuthor

How do you deal with the cycles issue? If there are cycles in the module graph, then you end up with cycles in the build graph, and hls-graph cannot handle them.

This has been discussed before: #1359, #2323 (comment)

There is no rule cycle because we now construct the ModuleGraph a single time using all the known targets. A GetModuleGraph rule doesn't depend on other GetModuleGraph rules (infact it is a defineNoFile rule, so there is only one instance). The rawDependencyInformation function already dealt with cycles by maintaining state for modules which have already been visited.

@pepeiborra
Copy link
Collaborator

pepeiborra commented Oct 8, 2022

There may be concerns about excessive build churn due to any change to the result of GetModuleGraph
invalidating the result of GhcSessionDeps too often, but note that this only happens when something
in the header of a module changes,

I have concerns about this. It's not just build churn, but the fact that to load a module first I need to compute the module graph for the entire codebase. How is this going to affect init times for large codebases with >10.000 modules?

this could be solved easily be re-introducing
a version of GetDependencyInformation with early cutoff that essentially returns the result of GetModuleGraph
but includes the hash of only the ModSummarys in the downward dependency closure of the file.

By all means, include this in the PR. But it doesn't solve the startup cost issue.

@wz1000wz1000force-pushed the wip/module-graph-mem branch 2 times, most recently from d286a48 to aea4709CompareDecember 9, 2022 13:18
@wz1000wz1000force-pushed the wip/module-graph-mem branch from aea4709 to 6ec2635CompareDecember 20, 2022 09:17
@wz1000
Copy link
CollaboratorAuthor

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

@pepeiborra
Copy link
Collaborator

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

where is the option? I cannot see it

@wz1000
Copy link
CollaboratorAuthor

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

where is the option? I cannot see it

Check now, the push was still going when I made the comment.

@wz1000wz1000force-pushed the wip/module-graph-mem branch 6 times, most recently from 2f059f8 to 616d328CompareDecember 20, 2022 11:42
@wz1000wz1000 enabled auto-merge (rebase) December 20, 2022 11:48
@pepeiborra
Copy link
Collaborator

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

where is the option? I cannot see it

Check now, the push was still going when I made the comment.

The option looks good, but I kind of want to see a benchmark in a large code base. Both with the full module graph and without it.

Do you still want to get this merged before the next release? If it cannot wait, then go ahead and merge it

@wz1000wz1000 disabled auto-merge December 22, 2022 08:35
@wz1000wz1000force-pushed the wip/module-graph-mem branch from fbc1bd5 to 0b64655CompareDecember 22, 2022 09:06
@wz1000
Copy link
CollaboratorAuthor

This needs #3391 to work because the eval plugin currently relies on the driver (it calls load) which inspects ms_hs_date which we unset for GetModSummaryWithoutTimestamps. It is not desirable to use GetModSummary for the module graph as then any change to any file will invalidate the module graph (which all files depend on), rather than just changes to the module header.

We shouldn't be using the GHC driver anyway as we want to perform loading and recompilation checking using our own logic. The rework of the eval plugin fixes this.

I will land them both after the release.

@pepeiborra
Copy link
Collaborator

@wz1000 I benchmarked this change in Bigcorp codebase and the results were not good. 6X longer startup times with full module graph and 5X without full module graph. This is compared to my O(imports) GetDependencies hack, which for reference is simply:

getDependencies :: NormalizedFilePath -> Action TransitiveDependencies getDependencies f = do imports <- use_ GetLocatedImports f let direct = [artifactFilePath | (_, Just ArtifactsLocation{..}) <- imports] transitive <- foldMap transitiveModuleDeps <$> uses_ GetDependencies direct let !final = HashSet.toList $ HashSet.fromList $ direct <> transitive return $ TransitiveDependencies final 

Full results:

BEFORE

name | success | samples | startup | setup | userTime | delayedTime | totalTime ------------ | ------- | ------- | ------- | ----- | -------- | ----------- | --------- edit | True | 10 | 2m10s | 0.00s | 13.82s | 0.00s | 13.83s edit imports | True | 10 | 1m49s | 0.00s | 0.27s | 0.00s | 0.27s /data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 355,733,992,672 bytes allocated in the heap 52,305,367,504 bytes copied during GC 13,424,971,888 bytes maximum residency (10 sample(s)) 552,684,264 bytes maximum slop 12803 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 413 colls, 412 par 563.854s 14.237s 0.0345s 0.5817s Gen 1 10 colls, 8 par 290.148s 18.952s 1.8952s 10.9664s Parallel GC work balance: 89.61% (serial 0%, perfect 100%) TASKS: 198 (7 bound, 168 peak workers (191 total), using -N40) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 378.871s ( 89.268s elapsed) GC time 854.002s ( 33.189s elapsed) EXIT time 0.043s ( 0.006s elapsed) Total time 1232.917s (122.464s elapsed) Alloc rate 938,932,822 bytes per MUT second Productivity 30.7% of total user, 72.9% of total elapsed 

fullModulegraph=true

name | success | samples | startup | setup | userTime | delayedTime | totalTime ---- | ------- | ------- | ------- | ----- | -------- | ----------- | --------- edit | True | 10 | 12m42s | 0.00s | 13.31s | 0.00s | 13.32s /data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 3,707,510,882,688 bytes allocated in the heap 96,192,538,160 bytes copied during GC 19,762,467,872 bytes maximum residency (14 sample(s)) 604,503,008 bytes maximum slop 18846 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 4080 colls, 4079 par 1592.894s 39.927s 0.0098s 0.8547s Gen 1 14 colls, 12 par 589.262s 45.278s 3.2342s 29.5445s Parallel GC work balance: 81.60% (serial 0%, perfect 100%) TASKS: 284 (7 bound, 270 peak workers (277 total), using -N40) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 3939.385s (722.147s elapsed) GC time 2182.156s ( 85.205s elapsed) EXIT time 0.089s ( 0.008s elapsed) Total time 6121.632s (807.361s elapsed) Alloc rate 941,139,498 bytes per MUT second Productivity 64.4% of total user, 89.4% of total elapsed 

fullModulegraph=false

name | success | samples | startup | setup | userTime | delayedTime | totalTime ---- | ------- | ------- | ------- | ----- | -------- | ----------- | --------- edit | True | 10 | 9m46s | 0.00s | 12.12s | 0.00s | 12.12s /data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 3,536,421,762,320 bytes allocated in the heap 86,187,539,816 bytes copied during GC 17,900,403,312 bytes maximum residency (15 sample(s)) 547,153,296 bytes maximum slop 17071 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3636 colls, 3635 par 1222.932s 30.644s 0.0084s 0.4791s Gen 1 15 colls, 13 par 549.295s 34.563s 2.3042s 19.7335s Parallel GC work balance: 84.83% (serial 0%, perfect 100%) TASKS: 268 (7 bound, 255 peak workers (261 total), using -N40) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 3415.233s (554.057s elapsed) GC time 1772.227s ( 65.207s elapsed) EXIT time 0.091s ( 0.016s elapsed) Total time 5187.552s (619.282s elapsed) Alloc rate 1,035,484,716 bytes per MUT second Productivity 65.8% of total user, 89.5% of total elapsed 

@wz1000
Copy link
CollaboratorAuthor

@pepeiborra do you mean your hack is separate from what lives in HLS master as of now? Is there any regression in fullModulegraph=false compared to current HLS master? I don't think there should be, but maybe I made a mistake.

I think we could use something like your hack if we can augment hls-graph cycle detection to produce recoverable errors.

@pepeiborra
Copy link
Collaborator

@pepeiborra do you mean your hack is separate from what lives in HLS master as of now?

Actually ignore the hack, I was confused, the BEFORE results are not using the hack.

Reason: we are still on ghcide 1.4, while upgrading to ghcide 1.8 I deleted the hack since GetDependencies no longer exists. Startup times with ghcide 1.4 + hack were around 1m20s, on vanilla 1.8 it's 2m10s.

Is there any regression in fullModulegraph=false compared to current HLS master? I don't think there should be, but maybe I made a mistake.

Yes, the regression is 2m10s -> 9m46s for startup. More specifically, by startup I mean loading the first module, which is handpicked to have a very large transitive module graph.

I think we could use something like your hack if we can augment hls-graph cycle detection to produce recoverable errors.

I don't think this can be done cleanly.

@wz1000wz1000force-pushed the wip/module-graph-mem branch 2 times, most recently from 299795a to 63d9ec6CompareDecember 26, 2022 11:41
@pepeiborra
Copy link
Collaborator

With the strictness fix, the regression is less dramatic: 50% longer to startup and 32% more total memory in use (17G of instead of 12.8G). Not quite ready yet, but less bad.

name | success | samples | startup | setup | userTime | delayedTime | totalTime ---- | ------- | ------- | ------- | ----- | -------- | ----------- | --------- edit | True | 10 | 3m16s | 0.00s | 12.04s | 0.00s | 12.04s /data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 390,443,605,936 bytes allocated in the heap 74,427,647,744 bytes copied during GC 17,896,901,816 bytes maximum residency (14 sample(s)) 560,837,448 bytes maximum slop 17067 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 929 colls, 928 par 784.410s 19.680s 0.0212s 0.5344s Gen 1 14 colls, 12 par 478.783s 34.307s 2.4505s 21.2495s Parallel GC work balance: 86.81% (serial 0%, perfect 100%) TASKS: 284 (7 bound, 264 peak workers (277 total), using -N40) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.099s elapsed) MUT time 718.973s (177.425s elapsed) GC time 1263.193s ( 53.987s elapsed) EXIT time 0.084s ( 0.009s elapsed) Total time 1982.251s (231.521s elapsed) Alloc rate 543,057,234 bytes per MUT second Productivity 36.3% of total user, 76.6% of total elapsed 

@wz1000wz1000force-pushed the wip/module-graph-mem branch from 63d9ec6 to a8c13f1CompareJuly 5, 2023 12:07
@wz1000
Copy link
CollaboratorAuthor

wz1000 commented Jul 12, 2023

I had another look at this and fixed a couple of leaks and it seems like this patch is a big win for performance in most common cases.

There are two options in RulesConfig which can have an impact on performance for large projects:

  1. fullModuleGraph(called checkForImportCycles in baseline)
  2. enableTemplateHaskell

Both of these options are enabled by default.

In baseline HLS, checkForImportCycles and enableTemplateHaskell were largely independent, checkForImportCycles controlled if on each call to ghcSessionDepsDefinition, we recomputed the entire module graph below the file (a quadratic if not exponential process because of all the duplicated work).

With the patch, checkForImportCycles results in calls to GetModuleGraph instead, avoiding duplicated work.

enableTemplateHaskell always entails calls to GetModuleGraph because of the way we need to decide if anything above a file in the graph for the whole project requires this file to be compiled.

With the patch, either of these options when enabled result in calls to GetModuleGraph.

I will be testing two configurations, NOGRAPH={fullModuleGraph=False, enableTemplateHaskell=False} and GRAPH={fullModuleGraph=True, enableTemplateHaskell=True}, as these are the two canonical configurations that decide if GetModuleGraph is called.

I tested this patch with a project generated via the following script:

#!/usr/bin/env bash # Generate $DEPTH layers of modules with $WIDTH modules on each layer # Every module on layer N imports all the modules on layer N-1 # MultiLayerModules.hs imports all the modules from the last layer DEPTH=15 WIDTH=40 for i in $(seq -w 1 $WIDTH); do echo "module DummyLevel0M$i where" > DummyLevel0M$i.hs; done for l in $(seq 1 $DEPTH); do for i in $(seq -w 1 $WIDTH); do echo "module DummyLevel${l}M$i where" > DummyLevel${l}M$i.hs; for j in $(seq -w 1 $WIDTH); do echo "import DummyLevel$((l-1))M$j" >> DummyLevel${l}M$i.hs; done done done echo "module MultiLayerModules where" > MultiLayerModules.hs for j in $(seq -w 1 $WIDTH); do echo "import DummyLevel${DEPTH}M$j" >> MultiLayerModules.hs; done 

I focused testing on the two extremes, DummyLevel0M1 (which imports nothing) and MultiLayerModules which imports everything.

Here are the results of running haskell-language-server from the command line on these files:

commitconfigfiletimebytes allocatedpeak memory
masterGRAPHMultiLayerModules17.28847,771,227,040 bytes1035 MiB
wip/module-graph-memGRAPHMultiLayerModules4.222s14,076,683,792 bytes712 MiB
masterNOGRAPHMultiLayerModules6.318s15,325,017,376 bytes952 MiB
wip/module-graph-memNOGRAPHMultiLayerModules6.126s15,328,584,272 bytes960 MiB
masterGRAPHDummyLevel0M10.396s572,811,704 bytes161 MiB
wip/module-graph-memGRAPHDummyLevel0M11.205s4,113,919,264 bytes302 MiB
masterNOGRAPHDummyLevel0M10.397s572,832,632 bytes161 MiB
wip/module-graph-memNOGRAPHDummyLevel0M10.397s572,799,488 bytes161 MiB

So this branch is much faster in the (default) GRAPH config for MultiLayerModules, slower for DummyLevel0M1, and equivalent in the NOGRAPH config for both files.

However, this is not the full story for the GRAPH config on DummyLevel0M1. Calling haskell-language-server on a file from the command line only runs the TypeCheck, GetHieAst and GenerateCore rules. None of these rules require us to know if we need to compile a file. It is only GetModIface that forces us to compute the full module graph in HLS master.
A more realistic test will then force at least one GetModIface call.

Testing DummyLevel1M1, one level up, so that GetModuleGraph is also called due to needing GetModIface on DummyLevel0M*, we get the following results:

commitconfigfiletimebytes allocatedpeak memory
masterGRAPHDummyLevel1M11.456s4,531,463,816 bytes416 MiB
wip/module-graph-memGRAPHDummyLevel1M11.461s4,502,678,944 bytes383 MiB

Changing the haskell-language-server commandline check command to instead call GetModIface on its arguments also equalises the performance of both branches on DummyLevel0M1

And we see that this branch is equivalent to or outperforms master for most realistic scenarios and configurations.

@wz1000wz1000force-pushed the wip/module-graph-mem branch 3 times, most recently from 7f06097 to 0220797CompareJuly 12, 2023 09:03
@pepeiborra
Copy link
Collaborator

Thanks for pushing ahead with this branch and sharing the revised benchmarks!

My only concern is that the benchmarks specifically test startup scenarios, and I would like to see bench numbers for the edit-module-headers scenario that is expected to regress:

this could be solved easily be re-introducing
a version of GetDependencyInformation with early cutoff that essentially returns the result of GetModuleGraph
but includes the hash of only the ModSummarys in the downward dependency closure of the file.

Did you look into this?

@wz1000wz1000force-pushed the wip/module-graph-mem branch from 318d89a to 75469a4CompareJuly 24, 2023 11:29
@wz1000
Copy link
CollaboratorAuthor

wz1000 commented Jul 24, 2023

@pepeiborra I've added a benchmark to edit the header and also added my script to the benchmarks.

After benchmarking, I made a few more rules early cutoff, this dramatically improves edit performance for all scenarios other than the DummyLevel0M01 case.

I tested three commits, baseline(master), module-graph(this patch without improvements to early cutoff) and module-graph-cutoff(this patch). Here are the results of benchmarking:

DummyLevel0M1:

version configuration name success samples startup setup userT delayedT 1stBuildT avgPerRespT totalT rulesBuilt rulesChanged rulesVisited rulesTotal ruleEdges ghcRebuilds maxResidency allocatedBytes baseline All edit-header True 50 0.32 0.00 0.50 66.47 66.46 0.01 66.97 10 9 13 11504 572850 43 912MB 51438MB module-graph All edit-header True 50 0.13 0.00 0.45 3.05 3.05 0.01 3.51 10 9 14 10906 164129 1 389MB 17112MB module-graph-cutoff All edit-header True 50 0.12 0.00 0.44 3.23 3.22 0.01 3.67 10 9 15 11467 141453 82 423MB 17867MB baseline All edit True 50 0.25 0.00 0.49 67.01 67.00 0.01 67.51 10 9 13 11546 572808 1 849MB 50631MB module-graph All edit True 50 0.12 0.00 0.46 3.15 3.14 0.01 3.61 10 9 14 10826 164209 81 399MB 17982MB module-graph-cutoff All edit True 50 0.09 0.00 0.40 3.02 3.01 0.01 3.42 10 9 15 11467 141453 82 423MB 17754MB 

DummyLevel0M01-edit

DummyLevel0M01:

version configuration name success samples startup setup userT delayedT 1stBuildT avgPerRespT totalT rulesBuilt rulesChanged rulesVisited rulesTotal ruleEdges ghcRebuilds maxResidency allocatedBytes baseline All edit-header True 50 0.19 0.00 34.70 66.20 66.19 0.71 100.91 651 9 775 11467 572886 81 781MB 71020MB module-graph All edit-header True 50 0.11 0.00 34.50 2.95 2.98 0.70 37.45 651 9 855 10906 164129 1 389MB 37691MB module-graph-cutoff All edit-header True 50 0.14 0.00 2.26 3.06 3.05 0.05 5.33 10 9 856 11469 141450 81 428MB 27791MB baseline All edit True 50 0.20 0.00 33.97 65.32 65.31 0.69 99.30 651 9 775 11546 572808 1 852MB 70324MB module-graph All edit True 50 0.12 0.00 36.88 3.26 3.25 0.75 40.14 651 9 855 10826 164208 82 389MB 38301MB module-graph-cutoff All edit True 50 0.28 0.00 2.22 2.95 2.94 0.05 5.18 10 9 856 11548 141372 1 429MB 27183MB 

DummyLevel1M01NoTH-edit

MultiLevelModules:

version configuration name success samples startup setup userT delayedT 1stBuildT avgPerRespT totalT rulesBuilt rulesChanged rulesVisited rulesTotal ruleEdges ghcRebuilds maxResidency allocatedBytes baseline All edit-header True 50 0.25 0.00 55.89 69.19 69.18 0.38 125.08 668 22 4393 11550 572758 180 865MB 205450MB module-graph All edit-header True 50 0.16 0.00 59.90 2.83 2.82 0.41 62.73 665 20 3867 10910 164158 104 628MB 103989MB module-graph-cutoff All edit-header True 50 0.21 0.00 16.37 3.16 3.15 0.11 19.53 19 16 3861 11552 141322 182 578MB 98625MB baseline All edit True 50 0.26 0.00 118.19 66.16 66.15 0.80 184.35 1871 628 4394 11550 572238 101 1221MB 685464MB module-graph All edit True 50 0.18 0.00 99.24 3.21 3.21 0.68 102.46 1871 628 3871 10910 163559 180 1081MB 517165MB module-graph-cutoff All edit True 50 0.27 0.00 64.15 2.98 3.03 0.44 67.14 1221 621 3861 11554 140802 134 1097MB 530448MB 

MultiLayerModules-edit

#endif
(catMaybes mss)
#endif
pure (fingerprintToBS $Util.fingerprintFingerprints $map (maybe fingerprint0 msrFingerprint) msrs, processDependencyInformation rawDepInfo bm mg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the fingerprint based on the module summaries? I can picture changed to module summaries that would not alter the module graph

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this may be too course. I can try to implement a finer grained hash, though I'm not sure how much it would improve things.

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to do this because now the dependency information contains a GHC.ModuleGraph which is essentially a [ModSummary].

@pepeiborra
Copy link
Collaborator

We may still want a per module early cutoff right? The new benchmark scenario Edit-header doesn't alter the module graph, so it doesn't tell us what happens when a portion of the module graph is altered. Previously it would only dirty the modules in that portion, whereas now (even with the ModuleGraph early cutoff) it will dirty all the modules.

Concretely, I'm suggesting to keep GetDependencyInformation rule, reimplemented as a lookup on GetModuleGraph with early cutoff. I thought this was your original proposal. I might be wrong and this might be unnecessary, only benchmarks can tell

@wz1000
Copy link
CollaboratorAuthor

We may still want a per module early cutoff right? The new benchmark scenario Edit-header doesn't alter the module graph, so it doesn't tell us what happens when a portion of the module graph is altered. Previously it would only dirty the modules in that portion, whereas now (even with the ModuleGraph early cutoff) it will dirty all the modules.

Concretely, I'm suggesting to keep GetDependencyInformation rule, reimplemented as a lookup on GetModuleGraph with early cutoff. I thought this was your original proposal. I might be wrong and this might be unnecessary, only benchmarks can tell

GetDependencyInformation was only used by ReportImportCycles - now I've made ReportImportCycles early cutoff (with a perfect hash, since the result is just unit). So ReportImportCycles can never dirty any modules. However, GetModuleGraph can still dirty ReportImportCycles. I'm not sure if adding a indirection to GetDependencyInformation in between would really help things, since inspecting if the module graph contains a cycle including the given file should not be too slow, certainly not slower than recomputing the hash for a restricted subset of the module graph (as we would need to do to implement GetDependencyInformation as you suggest).

Do you still think we need GetDependencyInformation?

@wz1000wz1000force-pushed the wip/module-graph-mem branch 5 times, most recently from 17a9330 to 366e23eCompareJuly 26, 2023 13:25
(do queueForEvaluation st nfp
setSomethingModified VFSUnmodified st [toKey IsEvaluating nfp] "Eval")
(do unqueueForEvaluation st nfp
setSomethingModified VFSUnmodified st [toKey IsEvaluating nfp] "Eval")
Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pepeiborra do you think we need this setSomethingModified call after calling unqueueForEvaluation?

Computing and storing GetDependencyInformation for each file essentially individually means that we perform downsweep on each file individually, wasting a lot of work and using an excessive amount of memory to store all these duplicated graphs individually. However, we already have the `GetModuleGraph` rule, which we need to compute before compiling files any way due to being depended on by `NeedsCompilation`, which needs to know if any reverse dependencies of the module we are compiling requires TH, which meant that each file already depends on the results of downsweep for the whole project. Instead, we can compute the whole graph once when we execute the `GetModuleGraph` rule and even use this inside `HscEnv.hsc_mod_graph` to avoid reconstructing the `ModuleGraph` on each invocation of `GhcSessionDeps`. There may be concerns about excessive build churn due to any change to the result of `GetModuleGraph` invalidating the result of `GhcSessionDeps` too often, but note that this only happens when something in the header of a module changes, and this could be solved easily be re-introducing a version of `GetDependencyInformation` with early cutoff that essentially returns the result of `GetModuleGraph` but includes the hash of only the `ModSummary`s in the downward dependency closure of the file.
early cutoff for eval plugin
@wz1000wz1000force-pushed the wip/module-graph-mem branch from 366e23e to 9aea2f6CompareAugust 2, 2023 08:56
@wz1000wz1000 added merge me Label to trigger pull request merge and removed merge me Label to trigger pull request merge labels Aug 2, 2023
@mergifymergifybot merged commit 9effc56 into masterAug 4, 2023
@fendorfendor mentioned this pull request Aug 8, 2023
19 tasks
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge meLabel to trigger pull request merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@wz1000@pepeiborra