What does FPU give us? What does FPU mean for computers? What is fpu floating point unit?

The fast processor is amazing! There is plenty to protect the officials who use the processor's speed code. People are being bullied because they measure fluidity in gigahertz - the more, the better. Those who are up to date should evaluate the productivity of the processor either through special tests, or how it copes with the processing of information in real data, which requires great computational effort (3D graphics, nya video films too). Based on the fact that most of the daily additions and igors generate a great amount of calculation over the speech numbers themselves (floating point numbers), the hidden productivity of the processor depends on how much speed they produce є. For these purposes, the processor has a special module called Floating-Point Unit (FPU) - a module that calculates the point that floats. At the same time, the productivity of this module depends only on the operating frequency of the processor, and its design features.

At the beginning of the evolution of IBM-mad computers, calculating over speech numbers, using a mathematical computer processor, was structurally related to the type of central processor. Already in the 486th processor, Intel introduced a module for calculating floating point, which significantly increased the speed of the processor with speech numbers. Over the past year, other manufacturers of processors for personal computers have switched to using FPUs.

It is significant that when working with speech numbers, there is the same nuance as in whole operations - the command cannot be executed in one clock cycle of the processor core (see the article "Advising the Processor Conveyor", "KV" No. /2003). And since in 486 processors, a five-stage conveyor has already begun to be used for processing entire commands, the FPU, as before, is not of the conveyor type, then. The team with the floating point is advancing and is obliged to check on the viconn in front. This affected the work of the processor with multimedia programs. And the rest at that hour had already begun to rapidly gain cover from their “asks”. It is entirely natural that Intel, starting with Pentium processors, began to stagnate the pipeline not only in general, but also in speech operations. The AMD corporation, for its part, took another route - instead of conveying the FPU, it began to introduce 3DNow! technology into its products, which was also aimed at increasing productivity in operations with speech numbers. This technology has emerged without any problems in its implementation. I think that many people remember that the AMD K6-2 competed with the Pentium II in all operations, being thirty percent higher than the new one in the sample of operational numbers.

Well, as it seems, to begin with, in the Athlon and new processors, the AMD corporation switched to a conveyor type FPU. Moreover, in the new AMD processors, in the module for calculating floating point, not only superconveyorization, but superscalar yes - in one processor Three FPU modules began to expand, apparently from which they take part in floating point calculations, in other words, with the release of Athlon processors.

FPU(Floating Point Unit) - a block that performs operations with a floating point (often called a coma) or a mathematical computational processor.

The FPU helps the main processor perform mathematical operations on speech numbers.

Initially, it was installed optionally, as an additional processor.

The FPU processor crystal has superior integration in the 1989 processor (Intel 80486 processor).

AMD Radeon Software Adrenalin Edition 19.9.2 Optional Driver

The new version of the AMD Radeon Software Adrenalin Edition 19.9.2 Optional driver improves the performance of Borderlands 3 and adds support for Radeon Image Sharpening technology.

Cumulative update of Windows 10 1903 KB4515384 (added)

June 10, 2019 Microsoft has released a cumulative update for Windows 10 version 1903 - KB4515384 with low security improvements and fixes that broke the Windows Search robot and increased CPU utilization.

Driver Game Ready GeForce 436.30 WHQL

NVIDIA has released a driver package Game Ready GeForce 436.30 WHQL, which is used for optimization in games: "Gears 5", "Borderlands 3" and "Call of Duty: Modern Warfare", "FIFA 20", "The Surge 2" and "Code" Vein" corrects a number of corrections noted in previous releases, and expands the range of displays in the G-Sync Compatible category.

AMD Radeon Software Adrenalin 19.9.1 Edition Driver

The first spring release of AMD Radeon Software Adrenalin 19.9.1 Edition graphics drivers is optimized for Gears 5.

Or better yet, even before the first problems appear, immediately check the stability of the computer and find out if anything is dying from leaks?

We launch and select from the window the program "Service" - " System stability test".

In front of you, vlasne, bachimo take:

The top graph shows the temperature of the computer components. Once you have checked or cleared the checkbox, you can select and display the image on the temperature graph of any storage unit. Obviously, if you know that your work does not overheat at any time, then simply uncheck the box so as not to clutter the chart with unnecessary information. There, above the first graph, you can navigate through the tabs that display other information (cooler wrapping fluidity, voltage, etc.). The most valuable tab remains there, because on it, in the initial table (where the minimum and maximum of any parameter are recorded), all statistics (temperatures, voltages, etc.) that are collected during testing are recorded.

As this is food, then I, as always, will inform you about them in the comments or for help.

AIDA64 is a richly functional program for measuring the characteristics of a computer, carrying out various tests that can show how stable the system is and whether the processor can be rebooted. Є significant solutions to test the stability of low-productivity systems.

The test for system stability transfers pressure to its elements (CPU, RAM, disks, etc.). With this help, it is possible to detect the malfunction of one or another component and immediately get started.

If you have a weak computer, then before performing the test you need to check that the processor does not overheat under extreme pressure. The normal temperature for the processor cores is initially 40-45 degrees. If your body temperature is high, it is recommended to either wait for the test or carry it out with caution.

This is due to the fact that during the test, the processor recognizes the pressure, through which (due to the fact that the CPU overheats in emergency mode) temperatures can reach critical values ​​of 90 or more degrees , which is no longer safe for the integrity of the processor itself, the motherboard and components, remixed instructions.

System testing

In order to print a stability test from AIDA64, find the item in the top menu "Service"(Located on the left side). Click on new and find in the menu "System stability test".

A window will open where there will be two graphs, a number of choices and buttons in the bottom panel. Bring respect back to the beast. Let's take a look at each of them in the report:


You can understand their mustache, but in this case there is a risk of over-inflating the system, since it is even weaker. Re-installation can cause an emergency re-installation of the PC, and only in short. When you select several points on the graphs, a number of parameters are displayed in order to work with them until the remaining graphs contain information.

You should first select the first three points and test them, and then the remaining two. In this case, there will be less pressure on the system and the graphics will be more reasonable. However, if a full-scale test of the system is required, then all points must be taken into account.

Below are two graphs. The first one displays the processor temperature. In addition to special items, you can view the average temperature for each processor or core, and you can display all the data on one graph. Another graph shows the number of loads per processor - CPU Usage. There is also such a point as CPU Throttling. Under normal operation of the system, the indicators of this point are not to blame for exceeding 0%. If there is a shift, then it is necessary to carry out testing and identify the problem with the processor. If the value reaches 100%, then the program itself will complete the work, and, having completed everything, the computer will be re-enabled until that time.

Above the graphs there is a special menu where you can view other graphs, for example, voltage and processor frequency. In separate Statistics You can make a short review of the skin components.

For the dough, there are elements that need to be tested at the top of the screen. After what click on "Start" at the bottom left part of the window. On testing, you can see close to 30 hvilins.

After an hour of testing in the window, turned out opposite the points for choosing options, you can begin to detect problems and their detection. While you are taking the test, look at the graphs. At elevated temperatures and/or at rising temperatures CPU Throttling Negainly apply the test.

Press the button to finish "Stop". You can save your results with help "Save". If more than 5 faults are detected, then not all problems with the computer need to be corrected. Before the skin is detected, it is given to the test at which time it was detected, for example, Stress CPU.

08.08.2012

Even before the advent of Intel Core processors, no one cared about the concept of “kernel efficiency”, and its importance appeared much greater, even though the frequencies and cache capacity were lower. How to show the effectiveness of the kernel in numbers. We present to you one of the options that you can use to evaluate productivity in a different way.

I would like to point out that the results of today’s test are not true in any other instance. I do not claim hundred-hundred-hundred-hundred-meter accuracy. By choosing other principles of testing, it is possible to obtain other results, otherwise, this method itself provides the opportunity to generate adequate results that are confirmed by history.

Why does the same processor show superior productivity? This food has tormented many amateurs and professionals throughout the world. For a long time now, clock speed has been the main yardstick of productivity. Some time later it switched to the frequency of the front bus, then to the caches, and then to the number of cores. But in the beginning it was bypassed that it really had a huge impact on the calculation efficiency.

What is important, of course, is the sheer productivity of the two most important blocks of modern x86 processors: the Arithmetic Logic Unit (ALU) and the Floating-Point Unit (FPU). The very complexity of their characteristics also means the concept of architecture - and to the cache frequency, this concept has a lot of relevance, so the underlying productivity of the processor is directly reflected.

So, before this, as there is obviously a lot of investigation, let’s figure out what these blocks are, what they do and how they are controlled. As I have already said, in this material we will not talk about working with memory, caches and other additional ones, we will only talk about ALU and FPU, and, of course, about their two important warehouses - conveyors and a breakdown prediction unit. Well, let’s talk a little about Intel’s Hyper-Threading technology, because it has a huge impact on core productivity during the simplest operations.

Block of whole operations

The first is the main processor unit. I would like to say it would be more correct to say not a block, but blocks, because there are only so many of them in processors. Roughly speaking, at the dawn of development, there was practically nothing on this block in the processor. The basic design of ALU, from the first models to the current monsters, has not changed. Vin also works with simple (whole) numbers, vibrating operations of addition, subtraction, alignment, and transformation of numbers; Concludes the simplest logical operations and leads to biological destruction.

Please note that the ALU does not have the same multiplication and sub-section, but all these types of calculations are rarely completed, and as a result they saw a power block - “integer multiplier”, which was able to increase productivity ALU, having spared many non-standard orders. . The operations in the subsection are also multiplied and added to a special table of constants. This is such a simple block, the productivity of which directly affects the productivity of the processor in many applications, for example, office applications, numerous specific programs for development, etc.

Count the block with a floating coma

This block appeared in processors much later, lower than the ALU, and was initially viewed as a hybrid processor. However, later it still migrated into the core of the main processor and from then on it became an invisible and yet important part (as is the case with the ALU, there is more than one of these blocks in the processor). As the name implies, the main tasks of the FPU are operations on floating numbers.

From the moment that this block appeared at the CPU warehouse, the demand for it grew throughout the entire hour, which eventually led to the demand for the FPU most often outweighing the demand for the ALU. In addition, in order to ensure the high versatility of this block, additional functions were “hung” on it, while at the same time it works with streaming permissions and is engaged in processing vector data, which is already found in current processors richly. The productivity of this block includes the productivity of the processor in the vast majority of programs, especially in multimedia, games, 3D work with photographs, etc.

Conveyor

It is clear that the skin operation in the processor requires a great deal of processing time, and these data, without exaggeration, are of great complexity. In order to optimize the work with them and organize their ordering, increase the speed of the work, the conveyor was invented.

Its principle is similar to the work of an emergency conveyor at a factory: the part passes step by step through a number of stationary posts of workers who are engaged in its processing, and they perform only one operation on it. Instead of parts, the processor receives data, which also goes through a number of steps. Of course, this approach allows you to significantly change the idle time of the skin-block processor, thereby significantly increasing its productivity due to the exclusive processing of data.

However, the conveyor has shortcomings, which is still superior. Golovny - there is no need to reset the entire conveyor of inheritances of uncontrolled changes during the program. Most often, this happens when there is a violation in the code of mental operators, which is due to the importance of changing further data and directions.

Another important point: the conveyors of different processors have a different number of stages. The advantage of short conveyors is that they allow you to achieve greater productivity at the same frequency, while a long conveyor can achieve greater clock frequencies. A simple example from life: AMD Athlon XP and Athlon 64 processors with K7 and K8 architectures, which competed with Intel Pentium 4 processors with NetBurst architecture. As you remember, many processors in these lines were very close to each other in terms of productivity, but were categorically different in terms of characteristics. Zokrema, Athlon 64 3200+ with a clock frequency of 2200 megahertz, most often outperforming the Pentium 4 with a frequency of 3200 megahertz. The reason for this fact is in the recent development of the conveyor: while AMD, traditionally, used a short 12-stage, Intel Pentium 4 used a long-lasting 20-stage, and three later, a 31-stage! There is a significant difference in productivity.

Block for transferring of smart passages (block for transfer of smart passages)

The appearance of this block was inevitable after the appearance of the conveyor. The problem of smart operators has already been voiced and inevitably the conveyor resets to zero, which significantly impacted the productivity, and the amount of work in hundreds of data processing simply went off scale.

So, what is this block doing? It's simple - it works as a regular clairvoyant of the processor, so that, by turning the steps (read the breakdown of the milky milk of the data), it means that the mental transition will be determined. Of course, there is no wizard in the thick of it. At the moment, the main and priority is the dynamic method of transferring transitions, in which the transferring unit not only analyzes the given instructions that are prepared before processing by the processor, but also analyzes the history of similar transitions And he accumulates it himself. Due to the fact that it consistently follows the sub-bag result (having guessed), and is equal to its growth, in addition to the government statistics, the effectiveness of transfer in similar situations in Maybu This will definitely increase. Through this tactic, there are a lot more correct transfers for this block, and fewer incorrect ones - current processors from Intel and AMD in 95-97 cases definitely indicate a direct mental transition. Naturally, resetting the conveyor is very rare.

Well, we’ve taken a small glimpse of the processors and now we can see how everything works in reality, how effective this other architecture is and how effective the ALU and FPU units are (and, of course, their additional and blocks). In order to be able to search for a wider range of processor cores and simultaneously minimize the impact on test results, such important parts of the current CPU as the cache, the processor bus and the bandwidth of the memory subsystem, we switched to the AIDA 64 test package. why is there more than two vibranos in the package? synthetic test – CPU Queen and FPU SinJulia. Why does it stink? There is confirmation in the very principle of their functioning and their complete conformity to the capabilities of this test. In order to understand how these and other architectural features of the skin test are determined by the test results, let’s take a look at the official description:

CPU Queen

A simple test. The result is stored, first of all, in the productivity of the Block of Integer Operations, and also sensitive to the efficiency of the Block of Transmission Rozgaluzhen, as a result of which the code takes place without any mental transitions.

At equal clock frequencies of processors, a model with a shorter conveyor and a smaller number of processors is gaining preference. However, with HyperThreading enabled, the Pentium 4 processor on the Northwood core produces higher results than the lower model with the Prescott core, due to the fact that in the first edition the short 20-stage pipeline is used, versus the 31-stage pipeline in the other.

When enabled, HyperThreading can change the distribution of forces and allow Prescott to overcome. In addition, the productivity of AMD processors of the K8 family is due to the fact that the models of the K7 family are inferior to the presence of a painted Transmission Block Rozgaluzhen in them.

CPU Queen test vikorista stream extensions MMX and SSE, right up to version SSSE3. Takes up less than 1 megabyte of RAM. Supports HyperThreading, multi-processor systems (SMP) and multi-core processors.

Selecting this dictation test before you can select the results of the memory subsystem and use the caches of your memory servers. To obtain the result of the operation of the ALU itself, which supports the transfer unit and degassing. Other tests with the ALU package, albeit insignificantly, still show an increase in frequency and cache volume, as well as in the throughput of the processor bus and memory bus. And in our case, if dozens of processors of different generations change, the difference in the productivity of these subsystems can reach many orders of magnitude. For example, one table is consumed: a Pentium III processor uses SDR-133 memory with a 64-bit memory bus width, and a Core i7 has a 192-bit memory bus with DDR3-1333.

And the HT support axis can’t be suppressed in this situation, because there are a lot of processors on the list that don’t support it, as well as a lot of real programs. However, this fact can be simply considered in the mind with direct equalization of two processors with and without HT support.

FPU SinJulia

Test for calculation with a floating coma and with increased accuracy (80 bits). The test is based on the analysis of one frame of a modified Julia fractal. The code of this test is written in assembly language, and also optimized for both Intel and AMD processors. Especially those kernels that can use trigonometric and exponential x87 instructions.
The FPU SinJulia test takes up less than 1 megabyte of RAM. Supports HyperThreading, multi-processor systems (SMP) and multi-core processors.

As you can see, the SinJulia FPU test, just like the CPU Queen, is completely independent of the productivity of the memory subsystem, as well as the frequency and volume of processor caches. Moreover, the result of SinJulia will be objective when comparing the old K6-III and the current Phenom II due to the fact that the test does not use stream extensions such as MMX and SSE. Well, high calculation accuracy allows you to create estimates adequate for daily tasks that rely on the CPU.

A selection of tests has been completed, but I can already hear voices protesting against the adequacy of the results of the upgrade of old and new processors. One of the arguments is about the equalization of processors with a different number of cores and different frequencies. So, especially for the sake of objectivity, we derived a productivity coefficient for the skin test, which was calculated using a simple formula:

test result/number of cores/frequency

Having divided the data into the values ​​for the skin processor, we took the result of one core per clock cycle. When describing tests, it is necessary to make a number of corrections. Persha: due to the presence of HyperThreading support, the processor will always produce greater results. Another: processors that do not support SSE show lower results in the ALU test, like CPU Queen. Fortunately, the list of such processors is not large; in fact, only AMD K6-III.

It is also important to remember that in fact the tested processor had its own motherboard. And the skin board, apparently, has its own clock generator, which can be used to reduce the reference frequency of the processor. This fact means that the results of the same processor on different motherboards vary. Doctors, who can eradicate this moment, were likely to deprive the results of the great loss that the results justified themselves by allowing the processors of the group to lead.

And this information was necessary for an adequate demonstration of the effectiveness of this and other architectures, and in some cases the kernel. Jumping ahead a little, I will say that this method of destructuring proved itself to be true, demonstrating high linearity and staleness of results.

Now let's talk about those who we all protested against. If you have already marveled at the table, you will notice that it contains 61 processors of different generations. Of course, not all of them were tested in our laboratory; only about a third were tested in our testing laboratory. A significant part of the results was taken from the database of AIDA 64 2.50 programs, which was a single test package in this version. Naturally, we did not blindly rely on the results. And we checked the results of their database once again, having carried out external tests, for several similar processors. The results, which include the loss of the reference frequency and the difference in such frequencies, were encouraging, demonstrating even more similarity. And then, without any doubt, we compiled the results from the base programs with the main results into one table.

It is also important to note here that in different versions of AIDA the results may not be the same, and therefore the results are not consistent. In our version, all the results were taken away from version 2.50.

Well, now is the time to move on to follow up on the test results, which were found to be complete and even useful. Now is the time to take a look at our main table, where you will find the most important characteristics of processors in this test, and most importantly the results of both tests with the already published data about core productivity per clock.

We note that the efficiency of FPU and ALU units can greatly vary, you are not to blame for two moments in which the same processor has amazing productivity when working with whole data, but in which it works much harder with floating data. oh. I want to do it on the fly. Before publishing the report, I would like to note that my description is based on a time line, just like the table of results is ordered by the absolute result of the ALU test.

The first and oldest processors in this list will be the AMD K6-III models on the Sharptooth core and Pentium III on the Katmai core. These processors can complete a short conveyor in just a few hours - only 12 stages of Intel and a minimum of 6 stages of AMD. However, the latter practically does not require a transfer unit, since problems associated with the wrong choice of mode will not affect the result as much as in the Pentium processor. There was no such thing in this processor, but in the Intel processor there was, although by current standards its efficiency is low, but the analysis mechanisms are the same as in current processors. As a result, the ALU test has the highest result for the short pipeline run of the AMD K6-III. Yogo result is 2.03 beats/cycle versus 1.93 for the competitor. And it doesn’t matter to those that AMD processors of this generation do not support streaming SSE expansion! At the same time, in the FPU test in front, there is a lot of damage to the transmission unit, Pentium III with a result of 0.164 units/cycle versus 0.128 for the representative of the K6 architecture.


Pentium III has increased in efficiency. Athlon can hardly compete with this parameter with its great success

Over the years, the Coppermine and Tualatin cores of the Pentium III processors have preserved the Katmai architecture without changes, and therefore the results of two processors: Celeron 700 and Pentium III 1333 are similar to those we have already tested. And the AMD axis, for an hour, the output of these processors began to look like the K6 architecture, since through a very short conveyor it was not possible to reach frequencies higher than 550 megahertz. As a result, the new K7 architecture eliminated the existing 10-stage pipeline and added many additional functions and changes that brought significant improvements in productivity. The main innovation and the most important within the framework of this material was the appearance of a degassing unit. However, the Pentium III processors and new products, which took over the Athlon, were not able to surpass the productivity of ALU units. The performance efficiency of the FPU has increased significantly: in this parameter, the AMD K7 Athlon significantly surpassed the K6 and was on par with its competitors, demonstrating a result of 0.163 od./cycle. And the axis of the conveyor has significantly reduced the efficiency of the ALU block – to 1.58 units/cycle, which is at least 25 watts higher than K6. However, this was justified, since the FPU in most of the additions at that time was important, and the higher frequency that was achieved as a result covered the costs in excess.

The transition of AMD Athlon to the Thunderbird core does not change the increase in power and efficiency per clock, although the core is based on the same architecture. Actually, shortly after them, the first Pentium 4 processors, based on the completely new NetBurst architecture, appeared on the market. Perhaps in terms of marketing and sales, these processors were an insane success, but in terms of engineering and efficiency, the greatest architecture in history did not exist.


Pentium 4 on Willamette core. One of the first processors based on the short-lived, or perhaps enduring, Netburst architecture.

The reason is this: in pursuit of the large megahertz that buyers so wanted, Intel engineers took an unusual move in order to reach more high frequencies, they significantly extended the conveyor to 20 stages. Naturally, in the race for megahertz they immediately became leaders, and productivity went into tact even noticeably. The average result of Pentium 4 processors on Willamette and Northwood cores in the ALU test is still 1.02, and in the FPU test it is 0.108. Compare the results of the Pentium III, the difference is colossal! In order to outperform previous generation processors in terms of productivity, the Pentium 4 required a significantly higher frequency. In fact, in order to achieve equal efficiency of ALU blocks with the highest-end processor of the Pentium III family, which operates at a frequency of 1400 megahertz, the Pentium 4 core may operate at a frequency of 2536 megahertz! And to achieve the same result in the FPU test, you need 2111 megahertz, which is a little less, but still not enough. So, if we average the results, then the efficiency is comparable to the Pentium III 1400 and Pentium 4 2.4 processors.

At the same time, AMD did not chase Intel in terms of frequencies, and, having retained the practically unchanged K7 architecture, released a line of Athlon XP processors, processors in which were no longer marked by frequency, but by ratings with a “plus” sign, which demonstrated the effectiveness of odo Pentium 4 processors. According to AMD marketers, the Athlon XP 1800+ processor is designed to compete with the Pentium 4 running at 1800 megahertz.

We can verify how adequate this approach is considering the fact that the efficiency of Athlon XP cores is at the level of 1.58 od./cycle in ALU and 0.163 od./cycle in FPU. With the real frequency of the 1800+ model being 1533 megahertz, the result is 2422 units in the Queen CPU and 250 in the SinJulia FPU. At the same time, the result of a Pentium 4 with a frequency of 1.8 gigahertz is in stock 1908 and 195 units, obviously. It seems that the rating is underestimated. I would like to remember that the productivity of real applications can be much different, since other characteristics of processors, caches, buses, etc. are used.

Surprisingly, it is a bitter truth that Intel engineers have not learned anything good, and once again, having become unable to increase the frequency, they will again go to increase it until the end of the conveyor. Moreover, not by a couple of steps, but even more significantly - since the Northwood core has only 20 steps, then the Prescott has become 31. And this is not just a long, but an even longer conveyor. Thus, as a result of these changes, the threshold for the maximum clock frequency of the new cores was mainly due to thermal imaging.


The Prescott core has become a further improvement in the Netburst architecture in the race for high megahertz. The most efficient Intel core in history.

However, the most important change, which not everyone could appreciate, was the significant drop in efficiency of the predecessor, and although the advent of HyperThreading technology in the present sense was the situation, then the processors are not victorious, showed simply greedy efficiency. Find the Pentium D 820 and 925 processors in the table, as well as the Celeron D 326 and you will understand what I am talking about. The result per clock demonstrated by the CPU Queen test was a very modest 0.75 units, and FPU SinJulia assessed the effectiveness of the updated NetBurst architecture at 0.081 units. The drop in productivity from the Willamette/Northwood cores resulted in approximately 30 watts in ALU and up to 40 watts in FPU.

Pairing Prescott-256 and Smithfield with AMD K8 processors is completely painless. So, the new architecture required a two-times larger conveyor, lower K7, and with this the emergence of a significantly colored, efficient transfer unit. As a result, kernels based on the new architecture demonstrate slightly higher efficiency of ALU and FPU work. The average performance of the CPU Queen test increased to 1.74 units, and the FPU SinJulia was equal to the previous one. As you know, it’s not for nothing that the Athlon 64 and Sempron processors have already been priced by gamers - their efficiency is even higher, more than twice that of the popular Pentium 4 with Prescott and Smithfield cores, which is more The rest of the supplements did not help at all. Some frequency, no great service to the cache of another level.


AMD's Athlon 64 was a more successful solution. On the Pentium 4, these processors had low energy consumption and exceptional efficiency.

However, at this stage it is important to remember that HyperThreading technology has appeared in the Prescott core itself. She appeared, obviously, not from a good life, and was an unsuccessful attempt to disguise the shortcomings of the long conveyor. As a matter of fact, despite the still incomplete technology at that time, the engineers were able to reduce the conveyor’s running time. For example, the Pentium 4 2800E processor based on the Prescott core and supports HT demonstrates the same efficiency as cores with a 20-stage pipeline, but without HT. However, the increase in efficiency due to HyperThreading support for Willamette/Northwood cores was not achieved, as compared to the result of the rare processor Pentium 4 3.46 GHz Extreme Edition, which is based on the Gallatin core (analogous to Northwood, but with 2 MB with L3 cache).

A little later, at the end of the NetBurst series, Intel engineers managed to significantly improve HyperThreading and achieve a significant increase in the efficiency of the robot block due to the floating coma. Bring back respect for the latest in the line of single-core Pentium 4 3.73 GHz Extreme Edition and dual-core Pentium 955 Extreme Edition. The efficiency of the FPU robot is already 0.138 units, while the productivity of the ALU is at the same level. However, to be sure, the AMD Athlon 64 X2 did not manage to outstrip the productivity of its main competitors, despite the fact that the others operate at a lower clock frequency and do not support HT.

Look at the table - the Athlon 64 X2 5200+ cannot compete with the processors of the NetBurst architecture, let alone the top-end AMD Athlon 64 6400+ at that time. However, Intel realized a long time ago that the pursuit of “great gigahertz” was calm, and it was preparing a new architecture, which would be no less successful in marketing terms than the Pentium 4, but even more effective.


Athlon 64 X2 is probably the only processor left today that can outperform top Intel processors. However, it was not difficult to reload ineffective and hot Pentium Ds.

Let's talk, first of all, about Core. Having dismantled this architecture, Intel engineers turned to the assembly line, which only takes 14 stages, to shorten it evenly with the rest of the NetBurst representatives. Naturally, in such minds, the word about reaching several gigahertz was no longer heard, but the first representatives of the new family, regardless of the low frequency, demonstrated the greatest productivity. Both processors of this generation – the Pentium M 730 on the Dothan core and the Core Duo T2500 on the Yonah core showed results per clock that exceed the Pentium III, and are noticeably lower than those of competing AMD K8 family.

The architecture has been tested on mobile solutions in a slightly modified form and has come to the desktop market in the form of Core 2 Duo and Pentium Dual Core processors. At the time of release, they could not boast of high frequencies, but at the same time they demonstrated the greatest effectiveness and as a result, apparently, did not care about the presence of HyperThreading support! At this point, it is natural to work and significantly reduce the transfer block. Marvel at the results. In the CPU Queen test, the average efficiency of the Conroe core and its similar ones rose to more than two units per clock and reached an average of 2.13. The FPU SinJulia test has a very good result – 0.175. This is, however, not much, but more, less for the processors of the first generation Core architecture, and much more than for the AMD K8, which the Pentium 4 fought for so long and unsuccessfully.


The Core 2 architecture, which replaced NetBurst, showed that Intel can run fast and cold processors that deliver high efficiency.

The high efficiency of the cores was once again achieved by the single-core Celeron, which was just a few years later, which at a modest frequency, and the Conroe-L core, showing productivity on par with its predecessors, performed at a frequency twice as high. And this, mind you, is at the same core. Overall, this architecture has proven itself to be as efficient as possible, and has already made it possible for AMD to try to catch up with its super competitors.

And this is where problems started for AMD. Now there are significant improvements in the efficiency of the core, and instead of dealing with this new problem, engineers, having created the K10 generation and similar processors under the names Phenom and Athlon, began to improve Number of cores and caches. The overall productivity of these solutions has increased significantly, but the impact on the effectiveness of changes has been insignificant. The productivity of the ALU has increased slightly, perhaps through additional improvements to the transition prediction unit, and the efficiency of the FPU has been lost absolutely without changes - with such characteristics it was possible to compete with Core 2 only for the number of cores or rather by frequency. The K10 generation processors still have some serious problems, as you can imagine.


Phenom is clearly a slow-witted processor. Its efficiency did not reach Core 2, which caused serious problems with frequencies.

As a result, Phenom never became a competitor to Core 2 Duo and Core 2 Quad processors. However, the problem with frequencies continued to persist, and the new processors Phenom II and Athlon II architecture K10.5 were ready to compete with Intel solutions for this performance. And the efficiency of the new generation was lost on the same level, and AMD solutions could not compete with competitors for equal frequencies. Previously, when switching to the 45-nanometer process technology, Intel again paid a little attention to the architecture and achieved a significant increase in the efficiency of the FPU unit, to the level of 0.185 units/cycle.

Regardless of comfort, in the workshops and laboratories of Intel, a new piece of work has already been hammered out to develop the Core architecture, which has been developed for the Core i3, i5 and i7 processors under the secret name Nehalem. The changes in the blocks and the reduction of all parameters led to excellent results. Marvel at the performance of the Core i5-750: the efficiency of the ALU has almost lost on par with the Core 2, but at the same time the productivity of the most important block of entire operations has increased significantly – up to 0.225 units per clock!

In addition to the architectural improvements, Intel has prepared another superb feature - the HyperThreading technology has been perfected. This strategy allowed us to achieve simply fantastic efficiency. This technology, when properly optimized, has given a great effect and may have increased efficiency twofold! 3.05 for ALU and 0.36 for FPU - this is simply a miraculous result. At the same time, even without the support of this technology, processors based on the Nehalem architecture turned out to be more effective than predecessors and competitors.


Nehalen became Intel's first architecture that placed maximum emphasis on core efficiency. The result was remarkable. Landings in Sandy Bridge and Ivy Bridge showed that there is still potential.

The two latest generations from Intel – processors based on Sandy Bridge and Ivy Bridge cores – also demonstrated greater productivity not only due to the increased frequency. Small changes in the kernels made it possible to consistently increase the productivity of a block of entire operations, by 0.25 units/cycle per generation, both with and without HyperThreading. And the axis for changing the effectiveness of the FPU is not available. However, without any embellishment, this show is very kind. Based on the trend, we have the right to experience increased efficiency even when the next generation of Intel processors appears.

AMD will never worry about such efficiency. Moreover, you shouldn’t sit still trying to paint the displays of your processors. Based on cores of the K10.5 architecture, the Llano processors demonstrated slightly lower ALU efficiency than the remaining Phenom and Athlon processors. Basically, the main part of the reprocessing unit was degassed, while the efficiency of the FPU was no longer on par with that demonstrated by all the latest AMD processors, starting with the first Athlon K7 family.


The remaining representative of the AMD family, which started back in the K7 generation - APU Liano. It’s a pity that it doesn’t match the efficiency of the remaining Intel processors

However, according to Llano, it is possible to take into account the decisions that are already outdated, since the nearest future AMD processors will be associated with processors of the new Bulldozer architecture, which was presented in AMD FX processors, and similar ones. These processors themselves, which turned out to be by no means perfect, put us in a bad situation when it came to reducing the efficiency of the cores. And all through those in which the principle of organization of nuclei is already foldable. The processor FX-8150 contains dual-core modules, and is declared by the company as eight-core. In order to punish the company for this, it would have been possible to protect its effectiveness from the expansion of eight cores, although it would have been technically incorrect, and the result would have been on par with Intel processors on the NetBurst architecture. Therefore, it was believed that the effectiveness was not on the core, but on the module, which is entirely justified, in fact, that the skin module has less than one block to count with a floating coma.


AMD FX on the Bulldozer architecture has shown a significant increase in efficiency, but the foldable architecture has not yet revealed itself. And, perhaps, it won’t be revealed anymore.

With ALU, everything is more complex - there are no such blocks in any modular processor, but they cannot be processed in parallel with sufficient efficiency due to the peculiarities of the task manager in Windows 7 and earlier OSs from Microsoft. Therefore, it was decided that the effectiveness of ALUs could be divided into a number of modules. The decision is debatable, and I will not stress the objectivity of this result. And the results, before the speech, turned out to be absolutely disgusting. Much better than the predecessors, of course. According to the efficiency of the ALU result, processors of the Bulldozer architecture showed a result of 2.2 units/cycle, which is noticeably lower than the K10.5, Llano and a little more lower than the Core 2, although Sandy Bridge, without any support Hyper Threading still far away. The efficiency of the FPU (whose result can be completely trusted) also significantly surpassed all previous AMD solutions, and appeared right between the early and late Core 2 architectures.

Based on these results, it is possible to conclude that processors of the Bulldozer architecture are not competitors to Intel processors starting with Nehalem, and the Core 2 axis can fight more effectively and turn over at equal frequencies. Not the most positive development for the “greens”.

For your convenience, we have summarized all the results into a table with average performance indicators for different nuclei.

On this we can put a speck on our investigation. No, not at all, because this material does not claim to be absolutely global, and as I said at the beginning of the material, it does not affect the effectiveness of many important processor blocks. However, without productive ALUs and FPUs, there are no efficient processors, and this material completely confirms this postulate. History has put everything in its place, and from the height of the fates that have passed, one can easily and inviolably put stamps and indicate mercy. And this same mercy is also an indispensable companion to progress, which, regardless of all the dead ends, inexorably leads us to a happy digital future.

Similar materials: