Life Hacks


Superscalar pentium processor



A superscalar architecture is a uniprocessor that can execute two or more scalar operations in parallel. Some definitions include superpipelined and VLIW architectures; others do not. Superscalar architectures (apart from superpipelined architectures) require multiple functional units, which may or may not be identical to each other. In some superscalar processors the order of instruction execution is determined statically (purely at compile-time), in others it is determined dynamically (partly at run time).

Pentium Pro:

(Known as “P6” during development) Intel’s successor to the Pentium processor, in development Jan 1995, generally available 1995-11-01. The P6 has an internal RISC architecture with a CISC-RISC translator, 3-way superscalar execution, and out-of order execution (or “speculative execution”, which Intel calls “Dynamic Execution”). It also features branch prediction and register renaming, and is superpipelined (14 stages).
The P6 is made as a two-chip assembly: the first chip is the CPU and 16 kilobyte first-level cache (5.5 million transistors) and the other is a 256 (or 512) kilobyte second-level cache (15 million transistors). The first version has a clock rate of 133 Mhz and consumes about 20W of power. It is about twice as fast as the 100 MHz Pentium. The original 0.35 micron versions of the Pentium Pro released on 1995-11-01 run at 150 and 166 Mhz for desktop machines and up to 200 Mhz for servers. Heat disspation is about 20 Watts.
The Pentium Pro is optimised for 32-bit software and runs 16-bit software slower than the original Pentium. The successor was the Pentium II.

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.

Processor board of a CRAY T3e supercomputer with four superscalar Alpha 21164 processors.


· Theory


A superscalar CPU architecture implements a form of parallelism called instruction-level parallelism within a single processor. It thereby allows faster CPU throughput than would otherwise be possible at the same clock rate. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor. Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier.

While a superscalar CPU is typically also pipelined, they are two different performance enhancement techniques. It is theoretically possible to have a non-pipelined superscalar CPU or a pipelined non-superscalar CPU.

The superscalar technique is traditionally associated with several identifying characteristics. Note these are applied within a given CPU core.

Intel Pentium Pro (“P6” ) ;-

The Pentium Pro was introduced in 1995 as the successor to the Pentium. It introduced several) unique architectural features that had never been seen in a PC processor before. The Pentium Pro was the first mainstream CPU to radically change how it executes instructions, by translating them into RISC-like microinstructions and executing these on a highly advanced internal core. (The Nexgen Nx586 processor was actually the first x86 CPU to use this design, but this chip was used in very few systems.)

The Pentium Pro achieves performance approximately 50% higher than a Pentium of the same clock speed. In addition to its new way of processing instructions, the Pentium Pro incorporates several other technical advances that contribute to this increased performance:

For a few reasons, the Pentium Pro is still, despite its age, an ideal choice for servers. First, it is a fast chip in general. Second, its integrated level 2 cache makes it ideal for multiprocessing; instead of having a single motherboard-based level 2 cache that all the processors must share, each has its own. Third, the Pentium Pro has chipsets available for it that are designed for high-end server use, moreso than the Pentium.

The most widely-publicized advanced feature of the Pentium Pro is of course the integrated level 2 cache. The Pentium Pro is shipped in a special dual cavity SPGA package that includes the chip itself and the integrated cache. It goes into a special Socket 8 interface unique to the Pentium Pro. One disadvantage of this arrangement is that the cache is not upgradable without also replacing the processor.

The integrated-cache design has been both a blessing and a curse for Intel. The blessing is that it greatly improves the performance of the chip. The curse is that it has been very difficult for Intel to manufacture the Pentium Pro at the volumes and cost levels necessary for it to become a mainstream processor. There are two main reasons for this. First, the cache itself is highly miniaturized and therefore much more expensive to produce than the typical SRAM chips used on a Pentium motherboard for level 2 cache. Second, some problems with the cache are not found until after it has been mated with the processor and installed in their shared package; when this happens the whole package (including the processor) must be thrown away, reducing yields and increasing costs. Due to the problems with its design, Intel has abandoned the integrated-cache concept and it is unlikely that any future PC processors will use it in the same way that the Pentium Pro does.

The Pentium Pro is usually found in either 180 MHz or 200 MHz versions. Older Pentium Pros ran at 150 and 166 MHz; these are far less common and the 166 MHz chip is in particular rarely seen. The 150 and 180 chips ship only with 256 KB level 2 cache, while the 200 is available with 256 KB, 512 KB or 1 MB of level 2 cache. The cost of the 200 MHz chip with 512 KB or 1 MB of cache is very high due to production costs and demand. The 166 MHz chip is unusual in that it was available with 512 KB of cache only.

Despite being almost two years old, the Pentium Pro processor is still commonly used in high-end systems, although the Pentium II is now starting to take some of this market. Until Intel comes out with a proper Pentium II chipset for servers, demand for the 200 MHz version (especially with 512 KB or 1 MB of cache) will continue to be high. In addition, multiple-Pentium-Pro servers are quite common and provide good performance at a reasonable price. The Pentium Pro often competes against non-Intel server processors such as DEC’s Alpha.

Intel Pentium Pro Family:-


Massive Power

Pentium(R) Pro Processor with 1 MB L2 Cache at 200 Mhz

The Pentium® Pro processor with 1 MB L2 cache is a multichip module targeted for use in high-end 4-way multiprocessor capable server systems. The component package contains an Intel Pentium Pro processor core, and 1 MB of L2 cache. The 1 MB cache is built using two of the 512 KB SRAM die found in the 512 KB version of the Pentium Pro processor. While the 512 K version uses a conventional ceramic package, the Pentium Pro processor with 1 MB L2 cache integrates the three die in a plastic package with an aluminum heat spreader. This 387-pin package is compatible with the current Pentium Pro processor footprint. The Pentium Pro processor with 1 MB L2 cache routes all of the processor’s high-speed cache interface bus through balanced nets on a thin film interconnect substrate to the two L2 SRAMs. This allows for internal component operation speeds of 200 MHz between the Pentium Pro processor and the L2 cache die.

Processor ID Chart



Seymour Cray’s CDC 6600 from 1965 is often mentioned as the first superscalar design. The Intel i960CA (1988) and the AMD 29000-series 29050 (1990) microprocessors were the first commercial single chip superscalar microprocessors. RISC CPUs like these brought the superscalar concept to micro computers because the RISC design results in a simple core, allowing straightforward instruction dispatch and the inclusion of multiple functional units (such as ALUs) on a single CPU in the constrained design rules of the time. This was the reason that RISC designs were faster than CISC designs through the 1980s and into the 1990s.

The Pentium was the first superscalar x86 processor; the Nx586, Pentium Pro and AMD K5 were among the first designs which decodes x86-instructions asynchronously into dynamic microcode-like micro-op sequences prior to actual execution on a superscalar microarchitecture

From scalar to superscalar:-

The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy is the difference between scalar and vector arithmetic. A superscalar processor is sort of a mixture of the two. Each instruction processes one data item, but there are multiple redundant functional units within each CPU thus multiple instructions can be processing separate data items concurrently.

Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and allowing it to keep the multiple functional units in use at all times. A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycleIn a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to redundant functional units contained inside a single CPU. Therefore a superscalar processor can be envisioned having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread.


Available performance improvement from superscalar techniques is limited by two key areas:

1. The degree of intrinsic parallelism in the instruction stream, i.e. limited amount of instruction-level parallelism, and

2. The complexity and time cost of the dispatcher and associated dependency checking logic.


Collectively, these two limits drive investigation into alternative architectural performance increases such as Very Long Instruction Word (VLIW), Explicitly Parallel Instruction Computing (EPIC), simultaneous multithreading (SMT), and multi-core processors.

Superscalar processors differ from multi-core processors in that the redundant functional units are not entire processors. A single processor is composed of finer-grained functional units such as the ALU, integer multiplier, integer shifter, floating point unit, etc

Leave a Reply