Reading notes and references on CGRA

to achieve high compute efficiency, overhead of adaptability should be reduced, while still supporting:

Single instruction steady state, e.g., single-cycle loops.
Data transport reduction, e.g., explicit bypassing, local register files, and small memories close to function units.
Application tailored exploitation of parallelism, e.g., VLIW with matching SIMD vector lanes.
Programmability.

Definition of CGRA

A spatial reconfiguration granularity at fixed functional unit level or above.
A temporal reconfiguration granularity at region/loop-nest level or above.

(1) the structure of the CGRA
(2) how it is controlled
(3) how it is integrated with a host processor (if any)
(4) the available tool support.

MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

对于CGRA这个mapping似乎近似是一个Subgraph matching问题?（可能比这个还强）应该是一个NP-hard问题。也就是说目前来说只能寄希望于非确定性算法？
这篇应该就是提出了一个这么一种方法。
问题是因为确实没有机器学习的基础，所以看不太懂RL和MCTS算法部分的表达式。。。
Tsinghua report on CGRA
Zhihu article 1
Zhihu article 2