Skip to main content

Scheduling

For the basics of how DAGZ runs tests in CI, see CI runs.

This page covers hardware-side reasons actual durations drift from the scheduler's plan.

Variance in test durations

A few common reasons actual durations drift from the plan:

  • Hyperthreaded CPUs, where each physical core hosts two logical threads. Contention under load typically costs 20-30%, depending on the workload.
  • CPU power limits, which cap total power across all cores.
  • Thermal throttling, especially on laptops running many workers at once.
  • Hybrid CPUs (Intel 12th gen+, Apple Silicon), where workers scheduled onto E-cores run 40-50% slower than ones on P-cores.
  • Test-side variability: randomness, network jitter, external services.

The CPU-side causes are detailed below.

CPU Throttling and Parallel Workers

Modern CPUs adjust clock speeds based on power and thermal headroom. When several workers run CPU-intensive tests at once, the CPU may throttle, making each test slower than it would be in isolation. Actual execution then runs longer than the scheduler planned.

Signs that throttling is the bottleneck:

  • Actual test time significantly exceeds planned time.
  • Per-worker lag that grows steadily through the run.
  • CPU at 100% but poor parallel speedup (e.g. 2× with 4 workers).

Hybrid CPUs (P-cores and E-cores)

Both Intel (12th gen+) and Apple Silicon CPUs mix fast P-cores (performance) with slower E-cores (efficiency). The OS may schedule workers onto E-cores, which can be 40-50% slower. Because DAGZ spreads work evenly, a worker stuck on an E-core becomes the bottleneck.

Intel: Power Limits (PL1)

Intel CPUs enforce a sustained power limit (PL1). One busy core can boost to high frequencies within that budget, but multiple busy cores must share it, reducing each core's clock speed. A laptop chip with PL1=28W may run one core at 4 GHz but throttle to 2 GHz with four cores loaded.

To check and raise PL1 on Linux:

# Check current limit (in microwatts)
cat /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw

# Raise to 65W (resets on reboot)
echo 65000000 | sudo tee /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw

# Some systems have a firmware override; check and raise this too
echo 65000000 | sudo tee /sys/class/powercap/intel-rapl-mmio:0/constraint_0_power_limit_uw

This is usually safe; the CPU still thermally throttles itself before overheating. You can also set PL1 in BIOS for a persistent change. Use turbostat to monitor actual clock speeds during runs.

Cloud environments

Cloud instances (AWS, GCP, Azure) use server CPUs (Xeon, EPYC, Graviton) with uniform cores and high power limits (150W+). The throttling and hybrid-core issues above are laptop- and desktop-specific and generally don't apply in the cloud.