Advanced Matrix Multiplication Optimization on Modern Multi-Core Processors
A detailed blog post on optimizing multi-threaded matrix multiplication for x86 processors to achieve OpenBLAS/MKL-like performance. Tags: High-performance GEMM on CPU, Fast GEMM on CPU, High-perfo...
salykova.github.io