What is a processor?

A processor is an integrated electronic circuit that performs the calculations that run a computer. A processor performs arithmetical, logical, input/output (I/O) and other basic instructions that are passed from an operating system (OS). Most other processes are dependent on the operations of a processor.

HOW PROCESSOR WORKS ?

Different stages of a processor:

Extract the instruction from memory.
Decode the instruction to find out the operation to be performed, i.e. Addition, Subtraction, etc.
Perform that instruction in the execute state.
Update the registers (flipflops) if there is some further instruction using that register.
Write back to memory.

INSTRUCTION FETCH(IF):

To load instruction or some piece of data from memory into a CPU register, all instructions must be fetched before they can be executed. The time it takes to fetch instruction is known as the fetch time or a fetch cycle and is measured in clock ticks.
PC is the program counter, from where the address of that location is fed to an instruction memory, and we get that instruction in the instruction register. PC is incremented by 1 to get the next location instruction.

INSTRUCTION DECODE(ID):

The decoding process allows the CPU to determine what instruction is to be performed so that the CPU can tell how many operands it needs to fetch from instruction memory in order to perform the instruction. The opcode fetched from the memory is decoded for the next steps and moved to the appropriate registers. Now, instruction is divided into different components depending on ISA, and reading register (source register) and writing register (destination register) are given.

EXECUTE(EXE):

The execution stage does all the computing part where ALU does every computation. It also has a flag which will decide whether a branch (means if there is the if-else statement, then we need to jump to a particular address) is taken or not and in the next stage, returns it back to PC so that it can perform PC+imm(immediate value) instead of PC+1.

MEMORY(MEM):

It stores the data that is being computed from the Execution stage into the memory at a particular address . Eg. SW R3 , 0(10) -> Store register 3 data in location 10 on data memory .

WRITEBACK(WB):

It updates the destination register in which we want to store the computed data . Eg. R3 = R1 + R2 -> register R1 and R2 data are added and then stored in R3.

How I learnt processor design?

Now to write all these stages we have HDL’s which support writing the hardware languages. In this lockdown, I figured out my interest in processor design and current technology and how we are able to work on a laptop. What interested me more is how from the advent of time, we became 100x faster as compared to the 1980s.

Then I learned processor design, what are the current technology, and what are the current fallacies and pitfalls.

Then I learned Verilog as the starting language to design hardware. We can even design a processor in python and c++ but the hectic part is to convert no to binary to hexadecimal or decimal to binary or vice versa. Also, these languages don't support the clock feature which is the basic element in hardware. We are just hardcoding our work in these languages and cannot see the IPC (Instruction per cycle) count. Now coming to other HDL languages like MyHDL, VHDL, Verilog, pYMTL, system Verilog, etc. I learned Verilog for eight months and after two-three months I created a basic processor. Then I moved on to the next stage to create a 5-stage pipeline processor also assuming data hazards. It took me 1 month to create a processor and even now I was not able to create an instruction memory and data memory. At that time I figured out one of the limitations in Verilog i.e. We cannot pass a 2-d array in a module instantiation. So in figuring out how to solve this issue, I came to a tool named Xilinx vivado in which there is an IP generator from which we can create a block memory whether it is readable only (ROM) or readable and writable both (RAM).

One difficulty in writing a pipeline processor is that we need to create 5 different modules for different stages. And then combine them in the top module. It was really hectic to instantiate those modules correctly and give control to all those signals correctly. It took me a total of 2 months to create a core properly. Now let's just check out the most trending TL-Verilog. I came to know this via VSDOpen 2020 in which I participated and get to know TL-Verilog and creator Steeve Hoover.

I came to know more about TL-verilog in RISCV – MYTH (microprocessor in thirty hours) i.e. 5 days. We can create a core in just 5 days with help of TL-verilog and MAKERCHIP IDE .

ADVANTAGES:

Proper guided tutorials in makerchip ide to create something.
No need to create different modules for a pipelined processor.
Proper visualization of core seen as a diagram.
Proper debugging by seeing the waveform.
No need to create memory as makerchip ide platform uses M4 preprocessor through which we can easily test the program in our processor.
No need to create a testbench to run your program . In verilog we need to create testbench to test our inputs but in makerchip we can still get some understandable result as it gives inputs random value.

Want to know some cool fact? A 13yr- old kid creates a core.Sounds cool, isn't it?
Check here:
Post 1
Post 2

Want to know more about this tech? Click here

Check out this latest talk about TL-verilog and makerchip: Click here

In verilog it took appx 1000 lines to design the whole code whether in TL-V 110 lines . A 10x better solution ... WOAAHHHHHHHHHHH....... Now lets just talk a little bit about TL-V.

Enter Transaction-Level Verilog (TL-Verilog). Though it is an evolution of SystemVerilog, the sequential Verilog constructs (as well as many other constructs) become legacy features for backward compatibility. In place of a software abstraction, you get abstractions that match the mental models designers use to reason about their microarchitectures: pipelines, state, validity, hierarchy, and transactions. Leave loops, structs, and objects to software. And lest you think the introduction of abstract context adds bloat to the code, TL-Verilog models are generally half the size of their Verilog counterparts, with all the same detail. High-level context actually simplifies the logic expression. And, what does it buy you besides less typing? How about fewer bugs, better code organization, smoother hand-off, top-down design, easier microarchitectural changes, safe re-pipelining, and easier leverage and reuse. The mechanisms of TL-Verilog would be impractical in the face of sequential semantics.
The point is, using software, with sequential semantics, as a foundation for hardware modeling has led us down a limiting path, and TL-Verilog makes this point evident. TL-Verilog is specifically designed for modeling hardware -- more-so than any other language. Without the baggage of a software foundation, it is able to provide abstract context suited to hardware design with numerous benefits. Want to know more about TL-VERILOG: Check it out

Its really a game changer! Now you need to focus more on logic and thinking stuff rather than doing something that is already done.
THINK HARD, WORK SMART!!

PROCESSOR in

Verilog vs.

TL-Verilog