Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-30102: Improve libssl performance on POWER8 for, e.g sha256#1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
gut commented Apr 19, 2017 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
To correctly pick the best algorithm for the current architecture, libssl needs to have OPENSSL_config(NULL) called as described on: https://wiki.openssl.org/index.php/Libcrypto_API This short change lead to a speedup of 50% on POWER8 when using hashlib.sha256 digest functionality as it now uses a SIMD approach that was already existing but not used by cpython.
mention-bot commented Apr 19, 2017
@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers. |
the-knights-who-say-ni commented Apr 19, 2017
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. Thanks again to your contribution and we look forward to looking at it! |
gut commented Apr 19, 2017
Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find |
pitrou commented Apr 19, 2017
Apparently Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace |
Modules/_hashopenssl.c Outdated
| * | ||
| * Source: https://wiki.openssl.org/index.php/Libcrypto_API | ||
| */ | ||
| OPENSSL_config(NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see this call in PyInit__hashlib() after ERR_load_crypto_strings().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I tested and I still have the results I expected.
However OPENSSL_config(NULL); is deprecated so I'll send this change as soon as I figure it out which interface should I use instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
gut commented Apr 20, 2017
@pitrou : I found a useful documentation talking about the deprecated interface: However I didn't have the same results when using |
Both solutions lead to a performance improvement on POWER8 when using sha digest
gut commented Apr 20, 2017
BTW, is there a deinitialization hook in order to call |
gut commented Apr 20, 2017
Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint? |
pitrou commented Apr 20, 2017
The requirement to call |
pitrou commented Apr 20, 2017
Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions? |
pitrou commented Apr 20, 2017
As for the Windows issue, perhaps you can try applying the following diff: If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this. |
gut commented Apr 20, 2017
it worked! Thanks. |
pitrou commented Apr 20, 2017
pitrou commented Apr 21, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86? |
gut commented Apr 24, 2017
@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on... |
gut commented Apr 24, 2017
@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86. |
gut commented Apr 25, 2017
pitrou commented Apr 25, 2017
@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch. |
gut commented Apr 25, 2017
@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch. Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop. Thanks for the script |
tiran commented Apr 25, 2017
Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html |
gut commented Apr 25, 2017
Very interesting. I was testing with However besides init and deinit performed automatically as stated on: It also states that some initialization are only performed explicitly. E.g: (emphasis added by me) Which looks like what I'm trying to initialize on this PR. I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered. |
tiran commented Apr 26, 2017
I don't think that I don't trust |
gut commented Apr 26, 2017
BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256. |
tiran commented Apr 27, 2017
Awesome! Thanks for you hard work! :) I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0. |
Replace ENGINE_load_builtin_engines to OPENSSL_cpuid_setup. It leads to the performance gain on hashing on POWER8 platform. I don't like the explicit function declaration but I didn't find it on OpenSSL headers.
gut commented Apr 28, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@tiran Don't worry. I see your reasons for being picky. I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:
One drawback though is that I don't like to declare the function |
gut commented Jul 11, 2017
Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first. Thanks in advance |
Modules/_ssl.c Outdated
| #if defined(WITH_THREAD) | ||
| #defineHAVE_OPENSSL_CRYPTO_LOCK | ||
| /* For some reason, this function is not declared on OpenSSL's headers */ | ||
| voidOPENSSL_cpuid_setup(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to only define this prototype if compiling with threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it was not. I just noticed it is inside of this definition. Great catch!
It is intended to be defined only when /* OpenSSL < 1.1.0 */.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks
| #endif | ||
| /* For some reason, this function is not declared on OpenSSL's headers */ | ||
| voidOPENSSL_cpuid_setup(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function guaranteed to be available on all versions of OpenSSL and LibreSSL that are supported by Python? At the moment it's 0.9.8 to 1.1.x and all versions of LibreSSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a nice way to verify that? I see that you have a project to compile a set of libressl and openssl versions.
Checking on OpenSSL, I notice that the commit that implemented this function is dated back to 2004 when 0.9.8 version didn't exist.
And on Libressl, it was added on 2008.
Would that be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, isn't Travis CI already checking these things?
gut commented Aug 7, 2017
just solved the conflict: now the Is there anything else missing? @tiran : please comment my previous reply to your review. Is that enough? |
gut commented Sep 5, 2017
I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on. |
To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API
This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.
https://bugs.python.org/issue30102