Skip to content

Conversation

@Paxxi
Copy link
Contributor

@PaxxiPaxxi commented Jun 24, 2017

Py_atomic* are currently not implemented as atomic operations
when building with MSVC. This patch attempts to implement parts
of the functionality required.

https://bugs.python.org/issue30747

@mention-bot
Copy link

@Paxxi, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jyasskin, @benjaminp and @akheron to be potential reviewers.

@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@pitrou
Copy link
Member

It seems you broke the build on Unix:
https://travis-ci.org/python/cpython/jobs/246629729#L1145

Parser/myreadline.c:311:33: error: implicit declaration of function '_Py_atomic_load_relaxed' is invalid in C99 [-Werror,-Wimplicit-function-declaration] if (_PyOS_ReadlineTState == PyThreadState_GET()){^ ./Include/pystate.h:252:31: note: expanded from macro 'PyThreadState_GET' ((PyThreadState*)_Py_atomic_load_relaxed(&_PyThreadState_Current)) ^ 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove this comment?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of updating it to reflect the current state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you forgot to update it :-)

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you shouldn't have moved the second #endif below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a runtime check whether HLE is available or not, or is it a compile-time check? If compile-time, it means we may mistakingly produce builds that won't run on non-HLE processors. @zooba

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation these will use HLEAcquire if the cpu supports it or use a regular _InterlockedExchange64 docs here https://msdn.microsoft.com/en-us/library/1s26w950.aspx

specifically this bit

On Intel platforms that support Hardware Lock Elision (HLE) instructions, the intrinsics with _HLEAcquire and _HLERelease suffixes include a hint to the processor that can accelerate performance by eliminating a lock write step in hardware. If these intrinsics are called on platforms that do not support HLE, the hint is ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. My question is whether the CPU detection happens at compile time or at run time. I can't find any information online...

Copy link
ContributorAuthor

@PaxxiPaxxiJun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no detection happening. Intel seem to have found a way to make it backwards compatible so older cpus will ignore the extra hint.

I tested this quickly and here's the generated assembly for the HLE and non HLE versions

; 12 : auto a = _InterlockedExchange_HLEAcquire(&test, 1); mov eax, 1 lea ecx, DWORD PTR _test$[ebp] xacquire xchg DWORD PTR [ecx], eax mov DWORD PTR _a$[ebp], eax ; 13 : auto b = _InterlockedExchange(&test, 2); mov eax, 2 lea ecx, DWORD PTR _test$[ebp] xchg DWORD PTR [ecx], eax mov DWORD PTR _b$[ebp], eax 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to copy what the GCC version does for these two functions?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure yet, I'm thinking that it's not required but I'm looking into it further to be sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the fence functions.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on it. Will be updating this PR with the load implementation and the bits for MSVC/ARM as well. Had to spend this week debugging another issue so I'm planning on having everything done around Monday/Tuesday next week.

@Paxxi
Copy link
ContributorAuthor

Paxxi commented Jul 5, 2017

This should be ready now. Test suite passes for x86/x64, completely untested for ARM as I don't have any suitable environment.

I don't really know how to validate that they're correctly atomic. The operations aren't really suitable to implement a ref count or a mutex as a test case so I'm open to ideas of any way to test this.

@pitrou
Copy link
Member

I don't really know how to validate that they're correctly atomic.

I don't think we need to. The potential issues here are extreme edge case race conditions that may happen once a month on heavily-loaded production systems. There's little we can check for in the test suite, IMHO.

Longer term, one possibility would be to build Python with thread sanitizer. That's not compatible with MSVC AFAIK, though.

@pitrou
Copy link
Member

@Paxxi, partially related, but have you seen PR #2417 and do you have any opinion about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On gcc/x86, _Py_atomic_thread_fence issues a mfence assembler instruction. Why doesn't it do the same here? Since it's the same CPU architecture, it should probably issue the same instruction...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And apparently MSVC has a MemoryBarrier macro.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These macros aren't used anywhere in the code base except in pyatomic.h for the gcc implementation so I skipped them.

Can implement them using the MemoryBarrier macro if required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then it's simpler to remove them, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pseudo-runtime checks, you may want to use a true compile-time check using e.g. SIZEOF_VOID_P.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that can help you get rid of some dummy definitions, by the way (for example the 64-bit load/store macros on 32-bit builds).

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do it at compile time since the macro is used for both _Py_atomic_address and _Py_atomic_int which will be different sizes on 64-bit Windows. int will still be 4 bytes so the runtime check is required to handle it correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the load and store macros are ok as defined, but that's because MSDN says "Most of the interlocked functions provide full memory barriers on all Windows platforms". Perhaps you can add a comment to that effect?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@pitrou
Copy link
Member

@zooba, do you think you can get some expert at Microsoft to take a look at this?
Otherwise, while I can't vouch for the correctness, this can't be worse than the statu quo, as long as it compiles fine :-)

@Paxxi
Copy link
ContributorAuthor

Added comment about interlocked functions and removed the fence macros as they're not used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? According to MSDN, those barrier intrinsics are only defined on x86 and x86-64.

Also, the macro doesn't look like it would expand correctly...

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's a mistake, no idea how that snuck in there

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@pitrou
Copy link
Member

@zooba, do you want to review this pre-merge or should it go ahead?

@pitrou
Copy link
Member

I think this is ready to merge. Would you like to add a NEWS entry using the "blurb" utility?

@Paxxi
Copy link
ContributorAuthor

Paxxi commented Aug 7, 2017

I'm not much of a writer so I'd rather skip writing anything about it. Feel free to do it on my behalf if it's considered newsworthy enough.

@pitrou
Copy link
Member

Well, I'm not sure how to edit your branch, perhaps you can try applying the following patch to it:

$ git diff --cached diff --git a/Misc/NEWS.d/next/Core and Builtins/2017-08-08-12-00-29.bpo-30747.g2kZRT.rst b/Misc/NEWS.d/next/Core and Builtins/2017-08-08-12-00-29.bpo-30747.g2kZRT.rst new file mode 100644 index 0000000000..04a726a7e6 --- /dev/null +++ b/Misc/NEWS.d/next/Core and Builtins/2017-08-08-12-00-29.bpo-30747.g2kZRT.rst @@ -0,0 +1,2 @@ +Add a non-dummy implementation of _Py_atomic_store and _Py_atomic_load on +MSVC. 

_Py_atomic_* are currently not implemented as atomic operations when building with MSVC. This patch attempts to implement parts of the functionality required.
@Paxxi
Copy link
ContributorAuthor

Paxxi commented Aug 8, 2017

Thanks! I added it to the commit now.

Default settings on Github afaik is that repo owners can push directly to a contributors branch for a PR but I didn't know it needed to be part of the commit.

Should be all set now.

@pitrou
Copy link
Member

Ok, thank you!

@pitroupitrou merged commit e664d7f into python:masterAug 12, 2017
segevfiner added a commit to segevfiner/cpython that referenced this pull request Aug 18, 2017
Apparently MSVC is too stupid to understand that the alternate branch is not taken and emits a warning for it. Warnings added in python#2383
pitrou pushed a commit that referenced this pull request Aug 20, 2017
* bpo-9566: Silence warnings from pyatomic.h macros Apparently MSVC is too stupid to understand that the alternate branch is not taken and emits a warning for it. Warnings added in #2383 * bpo-9566: A better fix for the pyatomic.h warning * bpo-9566: Remove a slash
GadgetSteve pushed a commit to GadgetSteve/cpython that referenced this pull request Sep 10, 2017
…#3140) * bpo-9566: Silence warnings from pyatomic.h macros Apparently MSVC is too stupid to understand that the alternate branch is not taken and emits a warning for it. Warnings added in python#2383 * bpo-9566: A better fix for the pyatomic.h warning * bpo-9566: Remove a slash
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@Paxxi@mention-bot@the-knights-who-say-ni@pitrou@Mariatta