We tested Intel's flagship Lunar Lake Ultra 9 and it's slower than an Ultra 7 in multi-threaded work in the only laptop where it's currently for sale [Updated]

Asus Zenbook S14
(Image credit: Tom's Hardware)

Edit 10/4/2024 3:15am PT: Intel provided further clarification of performance targets, which we added below. 

We tested the new Asus Zenbook S14 with the Lunar Lake Core Ultra 9 288V processor, and surprisingly, we found that the Core Ultra 9 is slower than the Ultra 7 model in multi-threaded workloads, a condition that Intel says Asus will soon address with a BIOS fix. Our findings are interesting because Intel originally shipped the Asus Zenbook S14 to reviewers for its Lunar Lake launch, but the US press received the model with the lower-tier Core Ultra 7 258V processor, not the flagship Core Ultra 9 288V processor.

We also noticed a new type of behavior with the Ultra 200V series — the Skymont E-cores run at higher frequencies than the Lion Cove P-cores during heavily threaded work, indicating an exceptional amount of performance and efficiency from the new E-cores.

Intel gave us the chance to test the Core Ultra 9-equipped Zenbook S14, which was originally only sent to a few reviewers in Europe because the design is not currently available for purchase in the US. In fact, according to our checks, this is the sole Core Ultra 9-powered notebook available on the market. However, our findings indicate that Intel and Asus still have work to do on the firmware side to deliver the full performance of the Core Ultra 9. 

After a pending BIOS update, Intel says we can expect the Ultra 9 and Ultra 7 chips to offer similar performance in multi-threaded workloads if used with the same power settings in the Asus laptop, with the primary benefit of the Ultra 9 being performance advantages in gaming and single-threaded workloads.

Swipe to scroll horizontally
Mobile processor core specifications
Header Cell - Column 0 Intel Lunar LakeIntel Lunar LakeIntel Meteor Lake
ProcessorCore Ultra 9 288VCore Ultra 7 258VCore Ultra 7 155H
LaptopAsus Zenbook S14Asus Zenbook S14Asus Zenbook 14 OLED
CPU cores4 P-core / 4 E-core4 P-core / 4 E-core6 P-core/8 E-core
Cores/Threads8/88 / 814 / 20
CPU P-core boost / base5.1 / 3.3 GHz4.8 / 2.2 GHz4.8 / 1.4 GHz
CPU E-core base / boost3.7 / 3.3 GHz3.7 / 2.2 GHz3.8 GHz / 900 MHz (LP e-core 2.5 GHz / 700 MHz)
Processor Base Power / Max30W / 37W17W / 37W28W
GPU modelArc Graphics 140VArc Graphics 140VArc Graphics
GPU cores (Xe / CU)888
GPU shaders102410241024
GPU boost clocks2050 MHz1950 MHz2250 MHz
Memory32GB LPDDR5X-853332GB LPDDR5X-853332GB DDR5-7467
NPU TOPS (INT8)484711

The table above shows the somewhat similar specifications for the two processors. With the same number of cores and the same cache capacities, the only real differences between the two chips boil down to clock speeds and power limits. The Core Ultra 9 has a 300 MHz higher P-core boost clock and 1 GHz higher base (P- and E-cores), with an ever-so-slight 100 MHz bump in GPU frequency. Both chips have a peak turbo power of 37W, but the Ultra 9’s base power (PL1) weighs in at 30W with a 17W minimum, whereas the Core 7 has a 17W base with an 8W minimum.

The Asus Zenbook S14 can be run at different power levels, as listed in the table below, by cycling through different fan settings in the MyAsus application. These settings not only adjust the fan profile but also adjust the configurable TDP. The same power settings apply to both the Ultra 9 and Ultra 7, thus removing TDP limits as a differentiator between the chips in our testing. That means it's down to the differences in clock speeds to deliver tangible advantages for the Ultra 9, and despite the relatively tame deltas, by no means should the Ultra 9 be slower than the Ultra 7.

Swipe to scroll horizontally
Asus Power ProfilesWhisperStandardPerformanceFull-Speed
Asus PL1_Max17W22W28W33W
Asus PL1_Min12W17W24W28W
Asus PL2 Power Limit (Max)28W37W37W37W
Noise dBA25324447

You can see our Zenbook S14 review with the Core 7 258V here, and aside from the different chips, our Zenbook sample with the Core Ultra 9 has the same specifications — right down to the cooling subsystem. Naturally, that led us to suspect that cooling could have been the culprit to the unexpectedly low performance, but as we’ll show shortly, that isn’t the case.

As you can see in the album above, we tested both the Core Ultra 7 and 9 in all four modes to measure performance differences across the full TDP range. Performance is generally within our expectations for single-threaded work — the Core 9 beats or matches the Core 7 in most of the single-threaded Cinebench and Geekbench 6 tests, but the Core 9 does trail slightly at the 17W Whisper threshold.

For the Whisper mode, HWiNFO lists the Core Ultra 9's cTDP setting for PL1 power at 13W, which isn't in line with the official specs of 17W. However, the Ultra 7 is dialed in at the correct 17W PL1. This would seem to be a minor fix in the BIOS, but larger differences reside in the threaded workloads at all other power levels, and those other power modes are tuned correctly for Ultra 9.

Performance in the threaded workloads stands out as notably inexplicable. The Ultra 7 took the lead across the full range of Cinebench, Geekbench, and Handbrake benchmarks at all power levels. Those deltas persisted despite multiple retests under varying conditions, so we contacted Intel, which confirmed our findings.

The Ultra 7 is only ~2 to 5.3% faster in Cinebench than the Ultra 9, 1.4% faster in Geekbench, and 3.5% faster in HandBrake. Those aren't huge differences, but logic dictates it should be slower, not faster. Additionally, if thermal limitations hamstring the Ultra 9 (i.e., the cooler wasn’t robust enough), we would expect it to, at a minimum, deliver the same performance as the Ultra 7 because they would both encounter a similar heat limitation. However, that wasn’t the problem. 

We charted the average effective clock speeds, chip temperature, and power draw during two runs of the Cinebench multi-threaded tests. We plot the P-core clock speeds in blue and the E-core clocks in black, with those values listed on the left-hand chart axis. We also have the power consumption in green and temperature in red, with those values listed on the right-hand chart axis. We recorded these with the processors in 'Performance' mode.

We have a chart to cover identical runs for both the Ultra 9 and Ultra 7. We can see that the Ultra 9 peaks at around 80C during the multi-threaded tests, so thermals are not the limiting factor (the chip has a 100C limit).

For both chips, the E-cores run at higher clock speeds than the P-cores during the Multi-threaded Cinebench tests. This is not the case for any of Intel’s previous hybrid x86 processors with both P- and E-cores for either desktop or mobile processors — the P-cores have always run at higher clock speeds during heavily threaded workloads (we’ll demonstrate that with Meteor Lake next).

Intel told us this is a new, intended behavior to extract the best mixture of power efficiency and performance in threaded workloads. That’s obviously now provided by the Skymont E-core architecture (affectionately nicknamed ‘Chad’mont by enthusiasts due to its big generational gains) instead of the Lion Cove P-cores — at least at these power levels. Our laptop reviewer, Andrew Freedman, initially noticed this new behavior, and I intended to create a voltage/frequency curve to highlight the difference in power efficiency. However, since the chip isn’t working correctly, we’ll have to wait for a new BIOS.

Swipe to scroll horizontally
Mobile processor core performance metrics
Workload TypeCore Ultra 9 288VCore Ultra 7 258V
Multi-threaded: P-core average3 GHz3 GHz
Multi-threaded: E-core average3.4 GHz3.5 GHz
Single-threaded: P-core peak5.16 GHz4.6 GHz
Single-threaded: E-core peak3.75 GHz3.55 GHz
Multi-threaded: Average Power22W19W

Given the chips' specifications, we expect both the P-cores and E-Cores to run faster for the Ultra 9 288V, yet the Ultra 9’s P-cores run at an average of 3 GHz during the test, the same as the Ultra 7’s. Additionally, the Ultra 7's E-cores run at an effective 3.5 GHz, 100 MHz faster than the Ultra 9, yet the Ultra 9 pulls more power (22W) on average than the Ultra 7 (19W) during this portion of the test.

For now, Asus customers will have to wait for a fix. Intel provided us with the following statement; “Your results look right for the current BIOS version. We expect Asus to soon release a new BIOS to address small performance inversions seen in some multi-threaded workloads. You may see additional improvements in single-threaded workloads.”

Intel later provided further clarification, saying, "At the equivalent power profiles ASUS has chosen for both Core Ultra 7 and 9 ZenBook configurations, we expect single-threaded and graphics workloads to realize the most benefits, with multi-threaded workloads largely performing within run-to-run variations with the upcoming BIOS update."

Our interpretation of the statement is that, after a BIOS update, we should expect the Ultra 9 and Ultra 7 to provide equivalent performance in threaded workloads in the Asus laptop, with the primary benefits of the Ultra 9 model being snappier single-threaded and gaming performance. 

Asus says it is working on a new BIOS, but hasn't given a specific release date.   

Swipe to scroll horizontally
Mobile processor core performance metrics
Workload TypeCore Ultra 7 155H
HP P-core average2.69 GHz
P-core average2.21 GHz
E-core average2.02 GHz
LP E-core average1.11 GHz
HP P-core peak4.79 GHz
P-core peak4.49 GHz
E-core peak3.79 GHz
LP E-core peak2.49 GHz
Multi-threaded: Average Power31W

We ran the same tests on a Meteor Lake Core Ultra 7 155H, also using an Asus Zenbook 14-inch chassis. Here’s an interesting tidbit that isn’t well-known: Meteor Lake has two types of P-cores (more below) and two types of E-cores. This means we had a lot of cores to plot. We plotted the same series of tests but put power and temps on a separate chart to improve readability.

The big takeaway here is that both types of P-cores run at higher frequencies than the E-cores during the multi-threaded Cinebench test, showing that Lunar Lake does, in fact, take a much different approach to delivering power-efficient performance in threaded workloads, an approach enabled by the strength of the Skymont E-core microarchitecture. Meteor Lake, using the same 'Performance' fan profile in the MyAsus app, also exhibited much higher average power use in our multi-threaded testing — though it was also faster and has far more total cores.

A bit of background about Meteor Lake’s two types of P-cores: It isn’t uncommon for a few cores to run faster on any given chip, but those cores are typically faster simply by virtue of the silicon lottery — some cores are naturally capable of faster speeds due to the variability of the semiconductor fabrication process, and the faster cores are identified in the binning process and then designated as the ‘favored’ cores.

With Meteor Lake, Intel intentionally designed two cores with faster, leakier transistors and assigned them 8 Vt points (threshold voltage points). The remainder of the cores used slower transistors with less leakage (meaning they are more power efficient) and only have 6 Vt points (the 'Intel 4' process node enabled the extra two Vt points). The additional two voltage thresholds on the faster P-cores enable them to hit higher clock rates, thus creating two cores that are specifically designed to be faster than the other cores despite the use of the same microarchitecture. Intel didn’t publicly announce this design decision, but the company confirmed it with me earlier this year. We've listed the cores as High Performance (HP) and Low Performance (LP) P-cores in the above chart.

It is well known that Meteor Lake also has two types of E-cores — the standard E-cores that reside on the CPU tile and two slower Low Performance (LP) E-cores that reside on the SoC tile. Intel intends to use the LP E-cores as much as possible to save power — even to run the entire OS during light workloads like video playback, which enables the chip to shut off the entire CPU tile. However, the lack of an L3 cache forced these cores to frequently access main memory, and lighting up the memory controllers and accessing memory is expensive from a power standpoint, thus offsetting much of the power efficiency gained from using the LP E-cores. 

To fix this, Intel added a Memory Side Cache to Lunar Lake, but this is really just analogous to the System Level Cache (SLC) found on Arm chips. In either case, the ‘Memory Side Cache’ approach is much more effective.

Conclusion

Intel's Lunar Lake represents a dramatic step forward on several fronts, especially in power efficiency and battery life, but we aren’t able to provide the deep-dive details on many of the architectural aspects of the Core Ultra 9 yet because we’re testing with a problematic BIOS. We'll post more benchmarks when we get a new working BIOS. Naturally, we’ll also retest the Core Ultra 7 chips with the new BIOS to make sure there aren’t any regressions with those chips.

Neither Intel nor Asus have given us a timeline for the new BIOS update. Given that there is an apparent lack of Ultra 9 models at retail in the US, it’s logical to think we’ll see more Ultra 9 laptops available in the same timeframe as the updated BIOS.

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • RosElks
    Kind of obvious what's happening; Intel did it on purpose so Cristiano can't have the so much craved 'battle of the benchmarks' :ROFLMAO:
    Reply
  • JRStern
    Appreciate you guys' work on all this, but it comes out so confusing. It's apparent that Intel themselves are massively uncertain what the E-core/P-core thing is all about. I'd *thought* it was about unloading IO onto E-cores so the P-cores didn't have to get interrupted, upsetting the pipeline and clearing the cache, and all, E-cores being much cheaper to build and run, and yes, I suppose they could run at different, even faster, clock speeds being simpler and all. Yet it never occurred to me that the chip could run different cores at different speeds at the same time - though maybe phones have been doing this for twenty years, so I'm a little behind on the details of some of this stuff, LOL, so was Intel.

    I also wondered just how heavy it would be for Windows to factor out what they wanted to run on an E-core versus a P-core, given different core counts of each, and base versus turbo, and yada yada. So much software development, even more than other areas of engineering, seem to consist of people, even the biggest companies, just bumbling around until they trip over something that works really well, but then six other people or projects or departments just won't accept it and keep running in other directions ...
    Reply
  • bit_user
    @PaulAlcorn , with release dates slipping and now this, perhaps 2024 is shaping up to be the year of the flubbed launch? Depends on how Arrow Lake's launch goes, but if that's also rough, then I'd say it's a bad sign for the industry.

    At least with AMD, I can sort of understand. They've broadened out their product portfolio a lot, in the past couple years. I think they might be getting stretched thin. In Intel's case, I have more trouble understanding why Lunar Lake (not unlike Meteor Lake before it) seems to be having a rough launch.
    Reply
  • Kamen Rider Blade
    bit_user said:
    At least with AMD, I can sort of understand. They've broadened out their product portfolio a lot, in the past couple years. I think they might be getting stretched thin. In Intel's case, I have more trouble understanding why Lunar Lake (not unlike Meteor Lake before it) seems to be having a rough launch.
    I think in AMD's case, it's Left-Hand not knowing what the Right-Hand is doing, and suddenly the Middle-Hand shows up and wants to do something.

    There are serious internal communications issue and standardization on BenchMark testing that needs to happen within AMD.

    Also, messaging to the public & reviewers needs to be adjusted to be concise, uniform, and properly tested before any information goes out.

    No more gas-lighting of reviewers over 1-2%, seriously, that's shameful.

    Also, no more releasing crap w/o having tested it themselves.

    They need to go Slower, Dot their I's, Cross their T's.

    This way they can end up going faster over-all, and not trip up on minor issues that get blown out of proportion and cause unnecessary head-aches.
    Reply
  • thestryker
    Greatly appreciate this update and look into the performance. I noticed that most of the laptops seemed to be trying to maximize performance putting lower chips into higher power modes so I'd just chalked it up to overall clock limits at the power levels. I'm curious if this is why there haven't been any releases with the 288V or if it was always to be a lower volume product/Asus got some sort of short term exclusivity.
    Reply
  • HideOut
    So the new 2xx core ultras have both less P cores AND less E cores? That makes sense. And while we are at it lets take out HT too...
    Reply
  • TheSecondPower
    JRStern said:
    Appreciate you guys' work on all this, but it comes out so confusing. It's apparent that Intel themselves are massively uncertain what the E-core/P-core thing is all about. I'd *thought* it was about unloading IO onto E-cores so the P-cores didn't have to get interrupted, upsetting the pipeline and clearing the cache, and all, E-cores being much cheaper to build and run, and yes, I suppose they could run at different, even faster, clock speeds being simpler and all. Yet it never occurred to me that the chip could run different cores at different speeds at the same time - though maybe phones have been doing this for twenty years, so I'm a little behind on the details of some of this stuff, LOL, so was Intel.

    I also wondered just how heavy it would be for Windows to factor out what they wanted to run on an E-core versus a P-core, given different core counts of each, and base versus turbo, and yada yada. So much software development, even more than other areas of engineering, seem to consist of people, even the biggest companies, just bumbling around until they trip over something that works really well, but then six other people or projects or departments just won't accept it and keep running in other directions ...
    I had originally heard of phones toggling between the little cores and the big cores and never using them both at the same time. But I don't think that's how phones work today and in hindsight, I don't think that's ever how the phone processors ever worked.

    I think that today little cores are primarily about having a lower cost core which can be used to supplement the big course during heavily threaded tasks when they wouldn't have been able to clock all that high anyway. In many cases they're also more power efficient than the big cores if the clock speed is low enough (big cores are more efficient if the clock speed is high enough). So in Lunar Lake, as the number of threads increases, the clock speed of all the cores goes down and eventually reaches the point where the little cores are operating more efficiently than the big cores, so the big cores clock down more than the little cores after this point.

    Intel has previously described Meteor Lake as having three classes of cores: LPE cores, E cores, and P cores. In all Intel products, P cores and E cores live on the same ring bus and share the same L3 cache. But LPE cores have no L3 cache and aren't on the ring bus. As far as I know, they only exist because Meteor Lake has multiple dies and the LPE cores are on the I/O and memory controller die so that CPU die can be shut down entirely during light work.

    It's also important to understand that in all Intel products a main thread is sent to the LPE cores, then the P cores, then the E cores (skipping any that don't exist). LPE cores are easy on the battery so start there, if the workload is too much send it to the P cores which are responsive, and if it doesn't need to be responsive move it to the E cores.

    Lunar Lake follows the same model, the only differences from Meteor Lake being that all the cores are on the same die, and there are no E cores. But I guess for convenience, the LPE cores are often called E cores on Lunar Lake. (There's no separate CPU die to power down when the LPE cores alone are used, but all the P cores, L3 cache, and ring bus can still be turned off.)
    Reply
  • usertests
    bit_user said:
    @PaulAlcorn , with release dates slipping and now this, perhaps 2024 is shaping up to be the year of the flubbed launch? Depends on how Arrow Lake's launch goes, but if that's also rough, then I'd say it's a bad sign for the industry.
    We also have port strikes and an impending quartz shortage. Maybe 2025 will be worse.
    Reply
  • JRStern
    TheSecondPower said:
    ...

    Lunar Lake follows the same model, the only differences from Meteor Lake being that all the cores are on the same die, and there are no E cores. But I guess for convenience, the LPE cores are often called E cores on Lunar Lake. (There's no separate CPU die to power down when the LPE cores alone are used, but all the P cores, L3 cache, and ring bus can still be turned off.)
    Interesting. Scary. Moving a thread from processor to processor ... well, but my guess is it probably doesn't, really, maybe an LPE core is dedicated to OS queue management, so the task and the LPE sees it exists and is queued but not even executing, but the first attempt to execute it is on a P-core. Now, that does require you can assume it needs a P-core as opposed to letting it demonstrate it first on an E-core, but I'd think the general class of a job is going to be known in advance, or maybe you can set some flags to have it treated one way or another.

    As I said before, making proper use of these new toys/tools, is itself a challenge, just as this article says.
    Reply
  • rluker5
    I don't like how Best Buy is obfuscating the CPU model number by calling them Core Ultra series 1 or Core Ultra series 2 and not listing the included CPU by name. There is a pretty big difference.
    Reply