Introduction

Major computing advancements over the last 70 years can primarily be attributed to two things: - Compilers that automatically compile source code on heterogeneous platforms

- Vender-independent OS' such as Unix and Linux that lower the cost of bringing out new architecture.

SPEC improvements during the 1990s increased by ~50%/year. These additional improvements were really enabled by organizational and architectural improvements and had a lot to do with the emergence of reduced instruction set computer (RISC) architectures.

Power requirements were originally constant because of the decreased transistor size - This ended in 2003 and forced manufacturers to consider multiple cores. Robert H. Dennard observed that the power requirements of a given silicon area were constant even as you increased the number of transistors because the transistors themselves were smaller.

In 2015, Moore’s law ended, making it more difficult to scale performance with technological advancements alone. Consequently, recent performance improvements are only about 3.5% per year.

Moore predicted that the number of transistors we could fit in a given area would double yearly, and he amended it to double every two years in 1975.

Classes of Computers:

Personal Mobile Device - System: $100 - $1000 - Microprocessor: $10 - $100 - Critical System Design Issues: - Cost, energy, media performance, responsiveness Desktop - System: $300 - $2500 - Microprocessor: $50 - $500 - Critical System Design Issues: - Price-performance, energy, graphics performance Server - System: $5000 - $10,000,000 - Microprocessor: $200 - $2000 - Critical System Design Issues: - Throughput, Availability, Scalability, Energy Cluster/Warehouse Scale Computers - System: $100,000 - $200,000,000 - Microprocessor: $50 - $250 - Critical System Design Issues: - Price-performance, Throughput, energy proportionality Internet of Things (IoT)/Embedded - System: $10 - $100,000 - Microprocessor: $0.01 - $100 - Critical System Design Issues: - Price, Energy, application-specific performance

Parallelism is the driving force of design across all computer classes. Computing resources are used to perform multiple tasks simultaneously. There are two types of parallelism: Data-level parallelism arises because many data items can be operated on at once, and Task-level parallelism arises because different work tasks can be operated independently.

Hardware can exploit these applications' parallelisms in four major ways:
    - instruction-level parallelism exploits data parallelism using ideas like pipelining and out-of-order execution.
    - Vector Architectures and devices like GPUs exploit data-level parallelism by applying a single instruction to a collection of data in parallel (SIMD Intrinsics)
    - Thread-level parallelism exploits data and task-level parallelism by executing parallel threads in a common hardware model
    - Request-level parallelism exploits data and task-level parallelism using decoupled tasks specified by the application or operating system.

Flynn's Taxonomy
    - SISD (Single Instruction, Single Data) - Standard sequential computer with ILP (Instruction Level parallelism)

    SIMD (Single Instruction, Multiple Data) is when the same instruction is executed by multiple processors using different data streams. SIMD exploits data-level parallelism (DLP) by applying the same operations to multiple data items in parallel. It is found in vector architectures and GPUs.

    - MISD (Multiple Instruction, Single Data) - not implemented by any commercial processor, but it rounds out the hierarchy.

    - MIMD (Multiple Instruction, Multiple Data) - each processor fetches its own instructions and operates on its own data. MIMD can target task-level parallelism in both tightly coupled and loosely coupled architectures. An example of a tightly coupled MIMD architecture might be a multicore processor, while a loosely coupled architecture would be a cluster machine.

Computer Architecture - ISA (Instruction Set Architecture): - Boundary between Compiler Software and Hardware. Each instruction has its own opcode and task (or set of tasks/microinstructions) that it performs - Several types of ISAs in use today: x86, Advanced/Acorn RISC Machine (ARM), and RISC-V - Most ISAs in use today are RISC; although x86 began as a CISC architecture, it's implementation in modern machines is as a RISC. - ISA is typically different in a few domains: the number of registers, addressing modes, instruction widths, etc. - Organization - Memory, Bus Structure, CPU, etc. Sometimes referred to as the microarchitecture - Hardware - Our definition of architecture also includes the hardware, which is the computer's detailed logic design and packaging technology. Many machines will differ in hardware and organization, but not in ISA. This approach allows the same binaries to run on different versions of processors that support the same ISA. Early on, the term architecture referred to instruction set design. Still, the definition has been extended as people recognized that these other design aspects are very important for meeting our computational goals.

Bandwidth—How much work is done in a given amount of time—processor throughput has improved by over 32,000 to 40,000 times since the 1980s. Latency—Time between the start and completion of an event—is more difficult to optimize—gains ~ only 50 to 90 times for processors and networks and eight to nine times for memory and disks.

  • Memory has made the least improvement of all architecture components in both latency and bandwidth

Power & Energy - Energy consumption is often the biggest design challenge facing modern computers. - Power has to be brought into the system and distributed to the different parts of the chip. Power gets dissipated from the chip as heat, and that heat has to be removed.

  • Static Power

    • You can think of static power as the constant power needed to keep the system on.

    • proportional to the number of devices

      • as transistor count increases, so does the static power requirement

    • It can be as much as 25-50% of total power consumption

    • The only way to reduce static power is to turn off the power supply to a device, known as 'power gating.' The tradeoff includes the overhead required when powering the device back on.

    • Power[static] = Current[static] * Voltage

** Refer to the picture that shows the energy comparison for various operations ** - an 8b Add requires .03 energy - A 32b DRAM read requires 640 energy (~21,000x) - A 32b SRAM read requires 5 energy (200x)

Performance - Typically measured in Response Time (Latency) or Throughput

There are two main types of locality in computer programs:

Last updated

Was this helpful?