Second this. Not to mention INSTANT resume from hibernation! It’s fucking crazy. I can use this thing ALL DAY doing webGL CAD work and Orca Slicer and barely scratch 50%.
With a modern system, I honestly don’t think there’s a noticeable difference between suspend to ram and suspend to disk. They’ve gotten the boot times down so much that it’s lightning-fast. My work laptop’s default is suspend to disk, and I don’t notice a difference except when it prompts for the bitlocker password.
S0 standby is borderline unusable on many PCs. On Apple silicon macs it’s damn near flawless.
My current laptop is probably the last machine to support S3 standby and I do not look forward to replacing it and being forced back into a laptop that overheats and crashes in my backpack in less than 15 minutes. On my basic T14 it works ok for the most part, but my full fat Thinkpad P1 with an i9 is in S0 standby for longer than a few minutes, and sometimes uses more power than when it was fully on. Maybe Meteor lake with it’s LP E cores will fix this but I doubt it.
It’s just that traditionally Intel and AMD earn most of their money from the server and enterprise sectors where high performance is more important than super low power usage. And even with that, AMD’s Z1 Extreme also gets within striking distance of the M3 at a similar power draw. It also helps that Apple is generally one node ahead.
If there’s ‘nothing stopping’ it then why has nobody done it? Apple moved from x86 to ARM. Mobile is all ARM. All the big cloud providers are doing their own ARM chips. Intel killed off much of the architectural competition with Itanic in the early 2000’s. Why stop?
Their primary money makers are what’s stopping them I reckon. Apple’s move to ARM is because they already had a ton of experience with building their own in house processors for their mobile devices and ARM licenses stock chip designs, making it easier for other companies to come up with their own custom chips whereas there really isn’t any equivalent for x86-64. There were some disagreements between Intel and AMD over patents on the x86 instruction set too.
This article fails to mention the single biggest differentiator between x86 and ARM: their memory models. Considering the sheer amount of everyday software that is going multithreaded, this is a huge issue, and the reason why ARM drastically outperforms x86 running software like modern web browsers.
Here is a great article on the topic. Basically, x86 spends a comparatively enormous amount of energy ensuring that its strong memory guarantees are not violated, even in cases where such violations would not affect program behavior. As it turns out, the majority of modern multithreaded programs only occasionally rely on these guarantees, and including special (expensive) instructions to provide these guarantees when necessary is still beneficial for performance/efficiency in the long run.
Thanks for the links, they’re really informative. That said, it doesn’t seem to be entirely certain that the extra work done by the x86 arch would incur a comparatively huge difference in energy consumption. Granted, that isn’t really the point of the article. I would love to hear from someone who’s more well versed in CPU design on the impact of it’s memory model. The paper is more interesting with regards to performance but I don’t find it very conclusive since it’s comparing ARM vs TSO on an ARM processor. It does link this paper which seems more relevant to our discussion but a shame that it’s paywalled.
On the x86 architecture, RAM is used by the CPU and the GPU has a huge penalty when accessing main RAM. It therefore has onboard graphics memory.
On ARM this is unified so GPU and CPU can both access the same memory, at the same penalty. This means a huge class of embarrassingly parallel problems can be solved quicker on this architecture.
Do x86 CPUs with iGPUs not already use unified memory? I’m not exactly sure what you mean but are you referring to the overhead of having to do data copying over from CPU to GPU memory on discrete graphics cards when performing GPU calculations?
Do you have any sources for this? Can’t seem to find anything specific describing the behaviour. It’s quite surprising to me since the Xbox and PS5 uses unified memory on x86-64 and would be strange if it is extremely slow for such a use case.
It’s been a while since I’ve coded on the Xbox, but at least in the 360, the memory wasn’t really unified as such. You had 10 MB of EDRAM that formed your render target and then there was specialised functions to copy the EDRAM output to DRAM. So it was still separated and while you could create buffers in main memory that you access in the shaders, at some penalty.
It’s not that unified memory can’t be created, but it’s not the architecture of a PC, where peripheral cards communicate over the PCI bus, with great penalties to touch RAM.
That’s actually not what I was referring to, although the unified memory architecture is certainly more power efficient for mixed-intensive workloads. The cost of transferring to/from dedicated GPU memory is (unsurprisingly) quite large.
As an ARM Mac user, I wouldn’t trade all this new battery life for an x86 processor
Second this. Not to mention INSTANT resume from hibernation! It’s fucking crazy. I can use this thing ALL DAY doing webGL CAD work and Orca Slicer and barely scratch 50%.
With a modern system, I honestly don’t think there’s a noticeable difference between suspend to ram and suspend to disk. They’ve gotten the boot times down so much that it’s lightning-fast. My work laptop’s default is suspend to disk, and I don’t notice a difference except when it prompts for the bitlocker password.
S0 standby is borderline unusable on many PCs. On Apple silicon macs it’s damn near flawless.
My current laptop is probably the last machine to support S3 standby and I do not look forward to replacing it and being forced back into a laptop that overheats and crashes in my backpack in less than 15 minutes. On my basic T14 it works ok for the most part, but my full fat Thinkpad P1 with an i9 is in S0 standby for longer than a few minutes, and sometimes uses more power than when it was fully on. Maybe Meteor lake with it’s LP E cores will fix this but I doubt it.
tbh it has been nearly flawless on win11 for me with an amd cpu
(just make sure to disable automatic windows/defender updates unless you want to get woken up by jet turbine sounds in the middle of the night)
There’s nothing stopping x86-64 processors from being power efficient. This article is pretty technical but does a really good explanation of why that’s the case: https://chipsandcheese.com/2024/03/27/why-x86-doesnt-need-to-die/
It’s just that traditionally Intel and AMD earn most of their money from the server and enterprise sectors where high performance is more important than super low power usage. And even with that, AMD’s Z1 Extreme also gets within striking distance of the M3 at a similar power draw. It also helps that Apple is generally one node ahead.
If there’s ‘nothing stopping’ it then why has nobody done it? Apple moved from x86 to ARM. Mobile is all ARM. All the big cloud providers are doing their own ARM chips. Intel killed off much of the architectural competition with Itanic in the early 2000’s. Why stop?
Their primary money makers are what’s stopping them I reckon. Apple’s move to ARM is because they already had a ton of experience with building their own in house processors for their mobile devices and ARM licenses stock chip designs, making it easier for other companies to come up with their own custom chips whereas there really isn’t any equivalent for x86-64. There were some disagreements between Intel and AMD over patents on the x86 instruction set too.
This article fails to mention the single biggest differentiator between x86 and ARM: their memory models. Considering the sheer amount of everyday software that is going multithreaded, this is a huge issue, and the reason why ARM drastically outperforms x86 running software like modern web browsers.
Do you mind elaborating what is it about the difference on their memory models that makes a difference?
Here is a great article on the topic. Basically, x86 spends a comparatively enormous amount of energy ensuring that its strong memory guarantees are not violated, even in cases where such violations would not affect program behavior. As it turns out, the majority of modern multithreaded programs only occasionally rely on these guarantees, and including special (expensive) instructions to provide these guarantees when necessary is still beneficial for performance/efficiency in the long run.
For additional context, the special sauce behind Apple’s Rosetta 2 is that the M family of SoCs actually implement an x86 memory model mode that is selectively enabled when executing dynamically translated multithreaded x86 programs.
Thanks for the links, they’re really informative. That said, it doesn’t seem to be entirely certain that the extra work done by the x86 arch would incur a comparatively huge difference in energy consumption. Granted, that isn’t really the point of the article. I would love to hear from someone who’s more well versed in CPU design on the impact of it’s memory model. The paper is more interesting with regards to performance but I don’t find it very conclusive since it’s comparing ARM vs TSO on an ARM processor. It does link this paper which seems more relevant to our discussion but a shame that it’s paywalled.
On the x86 architecture, RAM is used by the CPU and the GPU has a huge penalty when accessing main RAM. It therefore has onboard graphics memory.
On ARM this is unified so GPU and CPU can both access the same memory, at the same penalty. This means a huge class of embarrassingly parallel problems can be solved quicker on this architecture.
Do x86 CPUs with iGPUs not already use unified memory? I’m not exactly sure what you mean but are you referring to the overhead of having to do data copying over from CPU to GPU memory on discrete graphics cards when performing GPU calculations?
Yes unified and extremely slow compared to an ARM architecture’s unified memory, as the GPU sort of acts as if it was discrete.
Do you have any sources for this? Can’t seem to find anything specific describing the behaviour. It’s quite surprising to me since the Xbox and PS5 uses unified memory on x86-64 and would be strange if it is extremely slow for such a use case.
It’s been a while since I’ve coded on the Xbox, but at least in the 360, the memory wasn’t really unified as such. You had 10 MB of EDRAM that formed your render target and then there was specialised functions to copy the EDRAM output to DRAM. So it was still separated and while you could create buffers in main memory that you access in the shaders, at some penalty.
It’s not that unified memory can’t be created, but it’s not the architecture of a PC, where peripheral cards communicate over the PCI bus, with great penalties to touch RAM.
That’s actually not what I was referring to, although the unified memory architecture is certainly more power efficient for mixed-intensive workloads. The cost of transferring to/from dedicated GPU memory is (unsurprisingly) quite large.
So there is something stopping them. The manufacturers.
deleted by creator
As a potential iPad buyer, I would trade a millimeter of slimness for a vastly improved battery.
Idk, the battery of my 12.9“ iPad Pro is great.