initial commit

This commit is contained in:
2026-04-12 22:20:18 +08:00
commit 190c2edbb2
155 changed files with 36314 additions and 0 deletions

Binary file not shown.

Binary file not shown.

127
rtl/ip/open-la500/LICENSE Normal file
View File

@@ -0,0 +1,127 @@
木兰宽松许可证, 第2版
木兰宽松许可证, 第2版
2020年1月 http://license.coscl.org.cn/MulanPSL2
您对“软件”的复制、使用、修改及分发受木兰宽松许可证第2版“本许可证”的如下条款的约束
0. 定义
“软件”是指由“贡献”构成的许可在“本许可证”下的程序和相关文档的集合。
“贡献”是指由任一“贡献者”许可在“本许可证”下的受版权法保护的作品。
“贡献者”是指将受版权法保护的作品许可在“本许可证”下的自然人或“法人实体”。
“法人实体”是指提交贡献的机构及其“关联实体”。
“关联实体”是指对“本许可证”下的行为方而言控制、受控制或与其共同受控制的机构此处的控制是指有受控方或共同受控方至少50%直接或间接的投票权、资金或其他有价证券。
1. 授予版权许可
每个“贡献者”根据“本许可证”授予您永久性的、全球性的、免费的、非独占的、不可撤销的版权许可,您可以复制、使用、修改、分发其“贡献”,不论修改与否。
2. 授予专利许可
每个“贡献者”根据“本许可证”授予您永久性的、全球性的、免费的、非独占的、不可撤销的(根据本条规定撤销除外)专利许可,供您制造、委托制造、使用、许诺销售、销售、进口其“贡献”或以其他方式转移其“贡献”。前述专利许可仅限于“贡献者”现在或将来拥有或控制的其“贡献”本身或其“贡献”与许可“贡献”时的“软件”结合而将必然会侵犯的专利权利要求,不包括对“贡献”的修改或包含“贡献”的其他结合。如果您或您的“关联实体”直接或间接地,就“软件”或其中的“贡献”对任何人发起专利侵权诉讼(包括反诉或交叉诉讼)或其他专利维权行动,指控其侵犯专利权,则“本许可证”授予您对“软件”的专利许可自您提起诉讼或发起维权行动之日终止。
3. 无商标许可
“本许可证”不提供对“贡献者”的商品名称、商标、服务标志或产品名称的商标许可但您为满足第4条规定的声明义务而必须使用除外。
4. 分发限制
您可以在任何媒介中将“软件”以源程序形式或可执行形式重新分发,不论修改与否,但您必须向接收者提供“本许可证”的副本,并保留“软件”中的版权、商标、专利及免责声明。
5. 免责声明与责任限制
“软件”及其中的“贡献”在提供时不带任何明示或默示的担保。在任何情况下,“贡献者”或版权所有者不对任何人因使用“软件”或其中的“贡献”而引发的任何直接或间接损失承担责任,不论因何种原因导致或者基于何种法律理论,即使其曾被建议有此种损失的可能性。
6. 语言
“本许可证”以中英文双语表述,中英文版本具有同等法律效力。如果中英文版本存在任何冲突不一致,以中文版为准。
条款结束
如何将木兰宽松许可证第2版应用到您的软件
如果您希望将木兰宽松许可证第2版应用到您的新软件为了方便接收者查阅建议您完成如下三步
1 请您补充如下声明中的空白,包括软件名、软件的首次发表年份以及您作为版权人的名字;
2 请您在软件包的一级目录下创建以“LICENSE”为名的文件将整个许可证文本放入该文件中
3 请将如下声明文本放入每个源文件的头部注释中。
Copyright (c) [Year] [name of copyright holder]
[Software Name] is licensed under Mulan PSL v2.
You can use this software according to the terms and conditions of the Mulan PSL v2.
You may obtain a copy of Mulan PSL v2 at:
http://license.coscl.org.cn/MulanPSL2
THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
See the Mulan PSL v2 for more details.
Mulan Permissive Software LicenseVersion 2
Mulan Permissive Software LicenseVersion 2 (Mulan PSL v2)
January 2020 http://license.coscl.org.cn/MulanPSL2
Your reproduction, use, modification and distribution of the Software shall be subject to Mulan PSL v2 (this License) with the following terms and conditions:
0. Definition
Software means the program and related documents which are licensed under this License and comprise all Contribution(s).
Contribution means the copyrightable work licensed by a particular Contributor under this License.
Contributor means the Individual or Legal Entity who licenses its copyrightable work under this License.
Legal Entity means the entity making a Contribution and all its Affiliates.
Affiliates means entities that control, are controlled by, or are under common control with the acting entity under this License, control means direct or indirect ownership of at least fifty percent (50%) of the voting power, capital or other securities of controlled or commonly controlled entity.
1. Grant of Copyright License
Subject to the terms and conditions of this License, each Contributor hereby grants to you a perpetual, worldwide, royalty-free, non-exclusive, irrevocable copyright license to reproduce, use, modify, or distribute its Contribution, with modification or not.
2. Grant of Patent License
Subject to the terms and conditions of this License, each Contributor hereby grants to you a perpetual, worldwide, royalty-free, non-exclusive, irrevocable (except for revocation under this Section) patent license to make, have made, use, offer for sale, sell, import or otherwise transfer its Contribution, where such patent license is only limited to the patent claims owned or controlled by such Contributor now or in future which will be necessarily infringed by its Contribution alone, or by combination of the Contribution with the Software to which the Contribution was contributed. The patent license shall not apply to any modification of the Contribution, and any other combination which includes the Contribution. If you or your Affiliates directly or indirectly institute patent litigation (including a cross claim or counterclaim in a litigation) or other patent enforcement activities against any individual or entity by alleging that the Software or any Contribution in it infringes patents, then any patent license granted to you under this License for the Software shall terminate as of the date such litigation or activity is filed or taken.
3. No Trademark License
No trademark license is granted to use the trade names, trademarks, service marks, or product names of Contributor, except as required to fulfill notice requirements in Section 4.
4. Distribution Restriction
You may distribute the Software in any medium with or without modification, whether in source or executable forms, provided that you provide recipients with a copy of this License and retain copyright, patent, trademark and disclaimer statements in the Software.
5. Disclaimer of Warranty and Limitation of Liability
THE SOFTWARE AND CONTRIBUTION IN IT ARE PROVIDED WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL ANY CONTRIBUTOR OR COPYRIGHT HOLDER BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE SOFTWARE OR THE CONTRIBUTION IN IT, NO MATTER HOW ITS CAUSED OR BASED ON WHICH LEGAL THEORY, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
6. Language
THIS LICENSE IS WRITTEN IN BOTH CHINESE AND ENGLISH, AND THE CHINESE VERSION AND ENGLISH VERSION SHALL HAVE THE SAME LEGAL EFFECT. IN THE CASE OF DIVERGENCE BETWEEN THE CHINESE AND ENGLISH VERSIONS, THE CHINESE VERSION SHALL PREVAIL.
END OF THE TERMS AND CONDITIONS
How to Apply the Mulan Permissive Software LicenseVersion 2 (Mulan PSL v2) to Your Software
To apply the Mulan PSL v2 to your work, for easy identification by recipients, you are suggested to complete following three steps:
i Fill in the blanks in following statement, including insert your software name, the year of the first publication of your software, and your name identified as the copyright owner;
ii Create a file named “LICENSE” which contains the whole context of this License in the first directory of your software package;
iii Attach the statement to the appropriate annotated syntax at the beginning of each source file.
Copyright (c) [Year] [name of copyright holder]
[Software Name] is licensed under Mulan PSL v2.
You can use this software according to the terms and conditions of the Mulan PSL v2.
You may obtain a copy of Mulan PSL v2 at:
http://license.coscl.org.cn/MulanPSL2
THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
See the Mulan PSL v2 for more details.

View File

@@ -0,0 +1,7 @@
# openLA500
## 前言
openLA500是一款实现了龙芯架构32位精简版指令集loongarch32r的处理器核。其结构为单发射五级流水分为取指、译码、执行、访存、写回五个流水级。并且含有两路组相连结构的指令和数据cache32项tlb以及简易的分支预测器。此外处理器核对外为AXI接口容易集成。
OpenLA500已经过流片验证.13工艺下频率为100Mdhrystonecoremark分数分别为0.78 DMIPS/MHz(指令数有点高)2.75 coremark/Mhz。软件方面uboot、linux 5.14、ucore、rt-thread等常用工具及内核已完成对openLA500的适配。
详细设计报告见doc目录。

View File

@@ -0,0 +1,250 @@
`include "csr.h"
module addr_trans
#(
parameter TLBNUM = 32
)
(
input clk ,
input [ 9:0] asid ,
//trans mode
input inst_addr_trans_en ,
input data_addr_trans_en ,
//inst addr trans
input inst_fetch ,
input [31:0] inst_vaddr ,
input inst_dmw0_en ,
input inst_dmw1_en ,
output [ 7:0] inst_index ,
output [19:0] inst_tag ,
output [ 3:0] inst_offset ,
output inst_tlb_found ,
output inst_tlb_v ,
output inst_tlb_d ,
output [ 1:0] inst_tlb_mat ,
output [ 1:0] inst_tlb_plv ,
//data addr trans
input data_fetch ,
input [31:0] data_vaddr ,
input data_dmw0_en ,
input data_dmw1_en ,
input cacop_op_mode_di ,
output [ 7:0] data_index ,
output [19:0] data_tag ,
output [ 3:0] data_offset ,
output data_tlb_found ,
output [ 4:0] data_tlb_index ,
output data_tlb_v ,
output data_tlb_d ,
output [ 1:0] data_tlb_mat ,
output [ 1:0] data_tlb_plv ,
//tlbwi tlbwr tlb write
input tlbfill_en ,
input tlbwr_en ,
input [ 4:0] rand_index ,
input [31:0] tlbehi_in ,
input [31:0] tlbelo0_in ,
input [31:0] tlbelo1_in ,
input [31:0] tlbidx_in ,
input [ 5:0] ecode_in ,
//tlbr tlb read
output [31:0] tlbehi_out ,
output [31:0] tlbelo0_out ,
output [31:0] tlbelo1_out ,
output [31:0] tlbidx_out ,
output [ 9:0] asid_out ,
//invtlb
input invtlb_en ,
input [ 9:0] invtlb_asid ,
input [18:0] invtlb_vpn ,
input [ 4:0] invtlb_op ,
//from csr
input [31:0] csr_dmw0 ,
input [31:0] csr_dmw1 ,
input csr_da ,
input csr_pg
);
wire [18:0] s0_vppn ;
wire s0_odd_page ;
wire [ 5:0] s0_ps ;
wire [19:0] s0_ppn ;
wire [18:0] s1_vppn ;
wire s1_odd_page ;
wire [ 5:0] s1_ps ;
wire [19:0] s1_ppn ;
wire we ;
wire [ 4:0] w_index ;
wire [18:0] w_vppn ;
wire w_g ;
wire [ 5:0] w_ps ;
wire w_e ;
wire w_v0 ;
wire w_d0 ;
wire [ 1:0] w_mat0 ;
wire [ 1:0] w_plv0 ;
wire [19:0] w_ppn0 ;
wire w_v1 ;
wire w_d1 ;
wire [ 1:0] w_mat1 ;
wire [ 1:0] w_plv1 ;
wire [19:0] w_ppn1 ;
wire [ 4:0] r_index ;
wire [18:0] r_vppn ;
wire [ 9:0] r_asid ;
wire r_g ;
wire [ 5:0] r_ps ;
wire r_e ;
wire r_v0 ;
wire r_d0 ;
wire [ 1:0] r_mat0 ;
wire [ 1:0] r_plv0 ;
wire [19:0] r_ppn0 ;
wire r_v1 ;
wire r_d1 ;
wire [ 1:0] r_mat1 ;
wire [ 1:0] r_plv1 ;
wire [19:0] r_ppn1 ;
reg [31:0] inst_vaddr_buffer ;
reg [31:0] data_vaddr_buffer ;
wire [31:0] inst_paddr;
wire [31:0] data_paddr;
wire pg_mode;
wire da_mode;
always @(posedge clk) begin
if (inst_fetch) begin
inst_vaddr_buffer <= inst_vaddr;
end
if (data_fetch) begin
data_vaddr_buffer <= data_vaddr;
end
end
//trans search port sig
assign s0_vppn = inst_vaddr[31:13];
assign s0_odd_page = inst_vaddr[12];
assign s1_vppn = data_vaddr[31:13];
assign s1_odd_page = data_vaddr[12];
//trans write port sig
assign we = tlbfill_en || tlbwr_en;
assign w_index = ({5{tlbfill_en}} & rand_index) | ({5{tlbwr_en}} & tlbidx_in[`INDEX]);
assign w_vppn = tlbehi_in[`VPPN];
assign w_g = tlbelo0_in[`TLB_G] && tlbelo1_in[`TLB_G];
assign w_ps = tlbidx_in[`PS];
assign w_e = (ecode_in == 6'h3f) ? 1'b1 : !tlbidx_in[`NE];
assign w_v0 = tlbelo0_in[`TLB_V];
assign w_d0 = tlbelo0_in[`TLB_D];
assign w_plv0 = tlbelo0_in[`TLB_PLV];
assign w_mat0 = tlbelo0_in[`TLB_MAT];
assign w_ppn0 = tlbelo0_in[`TLB_PPN_EN];
assign w_v1 = tlbelo1_in[`TLB_V];
assign w_d1 = tlbelo1_in[`TLB_D];
assign w_plv1 = tlbelo1_in[`TLB_PLV];
assign w_mat1 = tlbelo1_in[`TLB_MAT];
assign w_ppn1 = tlbelo1_in[`TLB_PPN_EN];
//trans read port sig
assign r_index = tlbidx_in[`INDEX];
assign tlbehi_out = {r_vppn, 13'b0};
assign tlbelo0_out = {4'b0, r_ppn0, 1'b0, r_g, r_mat0, r_plv0, r_d0, r_v0};
assign tlbelo1_out = {4'b0, r_ppn1, 1'b0, r_g, r_mat1, r_plv1, r_d1, r_v1};
assign tlbidx_out = {!r_e, 1'b0, r_ps, 24'b0}; //note do not write index
assign asid_out = r_asid;
tlb_entry tlb_entry(
.clk (clk ),
// search port 0
.s0_fetch (inst_fetch ),
.s0_vppn (s0_vppn ),
.s0_odd_page (s0_odd_page ),
.s0_asid (asid ),
.s0_found (inst_tlb_found ),
.s0_index (),
.s0_ps (s0_ps ),
.s0_ppn (s0_ppn ),
.s0_v (inst_tlb_v ),
.s0_d (inst_tlb_d ),
.s0_mat (inst_tlb_mat ),
.s0_plv (inst_tlb_plv ),
// search port 1
.s1_fetch (data_fetch ),
.s1_vppn (s1_vppn ),
.s1_odd_page (s1_odd_page ),
.s1_asid (asid ),
.s1_found (data_tlb_found ),
.s1_index (data_tlb_index ),
.s1_ps (s1_ps ),
.s1_ppn (s1_ppn ),
.s1_v (data_tlb_v ),
.s1_d (data_tlb_d ),
.s1_mat (data_tlb_mat ),
.s1_plv (data_tlb_plv ),
// write port
.we (we ),
.w_index (w_index ),
.w_vppn (w_vppn ),
.w_asid (asid ),
.w_g (w_g ),
.w_ps (w_ps ),
.w_e (w_e ),
.w_v0 (w_v0 ),
.w_d0 (w_d0 ),
.w_plv0 (w_plv0 ),
.w_mat0 (w_mat0 ),
.w_ppn0 (w_ppn0 ),
.w_v1 (w_v1 ),
.w_d1 (w_d1 ),
.w_plv1 (w_plv1 ),
.w_mat1 (w_mat1 ),
.w_ppn1 (w_ppn1 ),
//read port
.r_index (r_index ),
.r_vppn (r_vppn ),
.r_asid (r_asid ),
.r_g (r_g ),
.r_ps (r_ps ),
.r_e (r_e ),
.r_v0 (r_v0 ),
.r_d0 (r_d0 ),
.r_mat0 (r_mat0 ),
.r_plv0 (r_plv0 ),
.r_ppn0 (r_ppn0 ),
.r_v1 (r_v1 ),
.r_d1 (r_d1 ),
.r_mat1 (r_mat1 ),
.r_plv1 (r_plv1 ),
.r_ppn1 (r_ppn1 ),
//invalid port
.inv_en (invtlb_en ),
.inv_op (invtlb_op ),
.inv_asid (invtlb_asid ),
.inv_vpn (invtlb_vpn )
);
assign pg_mode = !csr_da && csr_pg;
assign da_mode = csr_da && !csr_pg;
assign inst_paddr = (pg_mode && inst_dmw0_en) ? {csr_dmw0[`PSEG], inst_vaddr_buffer[28:0]} :
(pg_mode && inst_dmw1_en) ? {csr_dmw1[`PSEG], inst_vaddr_buffer[28:0]} : inst_vaddr_buffer;
assign inst_offset = inst_vaddr[3:0];
assign inst_index = inst_vaddr[11:4];
assign inst_tag = inst_addr_trans_en ? ((s0_ps == 6'd12) ? s0_ppn : {s0_ppn[19:10], inst_paddr[21:12]}) : inst_paddr[31:12];
assign data_paddr = (pg_mode && data_dmw0_en && !cacop_op_mode_di) ? {csr_dmw0[`PSEG], data_vaddr_buffer[28:0]} :
(pg_mode && data_dmw1_en && !cacop_op_mode_di) ? {csr_dmw1[`PSEG], data_vaddr_buffer[28:0]} : data_vaddr_buffer;
assign data_offset = data_vaddr[3:0];
assign data_index = data_vaddr[11:4];
assign data_tag = data_addr_trans_en ? ((s1_ps == 6'd12) ? s1_ppn : {s1_ppn[19:10], data_paddr[21:12]}) : data_paddr[31:12];
endmodule

109
rtl/ip/open-la500/alu.v Normal file
View File

@@ -0,0 +1,109 @@
module alu(
input [13:0] alu_op,
input [31:0] alu_src1,
input [31:0] alu_src2,
output [31:0] alu_result
);
wire op_add;
wire op_sub;
wire op_slt;
wire op_sltu;
wire op_and;
wire op_nor;
wire op_or;
wire op_xor;
wire op_sll;
wire op_srl;
wire op_sra;
wire op_lui;
wire op_andn;
wire op_orn;
// control code decomposition
assign op_add = alu_op[ 0];
assign op_sub = alu_op[ 1];
assign op_slt = alu_op[ 2];
assign op_sltu = alu_op[ 3];
assign op_and = alu_op[ 4];
assign op_nor = alu_op[ 5];
assign op_or = alu_op[ 6];
assign op_xor = alu_op[ 7];
assign op_sll = alu_op[ 8];
assign op_srl = alu_op[ 9];
assign op_sra = alu_op[10];
assign op_lui = alu_op[11];
assign op_andn = alu_op[12];
assign op_orn = alu_op[13];
wire [31:0] add_sub_result;
wire [31:0] slt_result;
wire [31:0] sltu_result;
wire [31:0] and_result;
wire [31:0] nor_result;
wire [31:0] or_result;
wire [31:0] xor_result;
wire [31:0] lui_result;
wire [31:0] sll_result;
wire [63:0] sr64_result;
wire [31:0] sr_result;
wire [31:0] andn_result;
wire [31:0] orn_result;
// 32-bit adder
wire [31:0] adder_a;
wire [31:0] adder_b;
wire adder_cin;
wire [31:0] adder_result;
wire adder_cout;
assign adder_a = alu_src1;
assign adder_b = (op_sub | op_slt | op_sltu) ? ~alu_src2 : alu_src2; //src1 - src2 rj-rk
assign adder_cin = (op_sub | op_slt | op_sltu) ? 1'b1 : 1'b0;
assign {adder_cout, adder_result} = adder_a + adder_b + adder_cin;
// ADD, SUB result
assign add_sub_result = adder_result;
// SLT result
assign slt_result[31:1] = 31'b0; //rj < rk 1
assign slt_result[0] = (alu_src1[31] & ~alu_src2[31])
| ((alu_src1[31] ~^ alu_src2[31]) & adder_result[31]);
// SLTU result
assign sltu_result[31:1] = 31'b0;
assign sltu_result[0] = ~adder_cout;
// bitwise operation
assign and_result = alu_src1 & alu_src2;
assign andn_result= alu_src1 & ~alu_src2;
assign or_result = alu_src1 | alu_src2; //bug6 cycle
assign orn_result = alu_src1 | ~alu_src2;
assign nor_result = ~or_result;
assign xor_result = alu_src1 ^ alu_src2;
assign lui_result = alu_src2;
// SLL result
assign sll_result = alu_src1 << alu_src2[4:0]; //rj << i5
// SRL, SRA result
assign sr64_result = {{32{op_sra & alu_src1[31]}}, alu_src1[31:0]} >> alu_src2[4:0]; //rj >> i5
assign sr_result = sr64_result[31:0];
// final result mux
assign alu_result = ({32{op_add|op_sub}} & add_sub_result)
| ({32{op_slt }} & slt_result)
| ({32{op_sltu }} & sltu_result)
| ({32{op_and }} & and_result)
| ({32{op_andn }} & andn_result)
| ({32{op_nor }} & nor_result)
| ({32{op_or }} & or_result)
| ({32{op_orn }} & orn_result)
| ({32{op_xor }} & xor_result)
| ({32{op_lui }} & lui_result)
| ({32{op_sll }} & sll_result)
| ({32{op_srl|op_sra}} & sr_result); //bug7 srl and sra sign
endmodule

View File

@@ -0,0 +1,317 @@
module axi_bridge(
input clk,
input reset,
output reg[ 3:0] arid,
output reg[31:0] araddr,
output reg[ 7:0] arlen,
output reg[ 2:0] arsize,
output [ 1:0] arburst,
output [ 1:0] arlock,
output [ 3:0] arcache,
output [ 2:0] arprot,
output reg arvalid,
input arready,
input [ 3:0] rid,
input [31:0] rdata,
input [ 1:0] rresp,
input rlast,
input rvalid,
output reg rready,
output [ 3:0] awid,
output reg[31:0] awaddr,
output reg[ 7:0] awlen,
output reg[ 2:0] awsize,
output [ 1:0] awburst,
output [ 1:0] awlock,
output [ 3:0] awcache,
output [ 2:0] awprot,
output reg awvalid,
input awready,
output [ 3:0] wid,
output reg[31:0] wdata,
output reg[ 3:0] wstrb,
output reg wlast,
output reg wvalid,
input wready,
input [ 3:0] bid,
input [ 1:0] bresp,
input bvalid,
output reg bready,
//cache sign
input inst_rd_req ,
input [ 2:0] inst_rd_type ,
input [31:0] inst_rd_addr ,
output inst_rd_rdy ,
output inst_ret_valid ,
output inst_ret_last ,
output [31:0] inst_ret_data ,
input inst_wr_req ,
input [ 2:0] inst_wr_type ,
input [31:0] inst_wr_addr ,
input [ 3:0] inst_wr_wstrb ,
input [127:0] inst_wr_data ,
output inst_wr_rdy ,
input data_rd_req ,
input [ 2:0] data_rd_type ,
input [31:0] data_rd_addr ,
output data_rd_rdy ,
output data_ret_valid ,
output data_ret_last ,
output [31:0] data_ret_data ,
input data_wr_req ,
input [ 2:0] data_wr_type ,
input [31:0] data_wr_addr ,
input [ 3:0] data_wr_wstrb ,
input [127:0] data_wr_data ,
output data_wr_rdy ,
output write_buffer_empty
);
//fixed signal
assign arburst = 2'b1;
assign arlock = 2'b0;
assign arcache = 4'b0;
assign arprot = 3'b0;
assign awid = 4'b1;
assign awburst = 2'b1;
assign awlock = 2'b0;
assign awcache = 4'b0;
assign awprot = 3'b0;
assign wid = 4'b1;
assign inst_wr_rdy = 1'b1;
localparam read_requst_empty = 1'b0;
localparam read_requst_ready = 1'b1;
localparam read_respond_empty = 1'b0;
localparam read_respond_transfer = 1'b1;
localparam write_request_empty = 3'b000;
localparam write_addr_ready = 3'b001;
localparam write_data_ready = 3'b010;
localparam write_all_ready = 3'b011;
localparam write_data_transform = 3'b100;
localparam write_data_wait = 3'b101;
localparam write_wait_b = 3'b110;
reg read_requst_state;
reg read_respond_state;
reg [2:0] write_requst_state;
wire write_wait_enable;
wire rd_requst_state_is_empty;
wire rd_requst_can_receive;
assign rd_requst_state_is_empty = read_requst_state == read_requst_empty;
wire data_rd_cache_line;
wire inst_rd_cache_line;
wire [ 2:0] data_real_rd_size;
wire [ 7:0] data_real_rd_len ;
wire [ 2:0] inst_real_rd_size;
wire [ 7:0] inst_real_rd_len ;
wire data_wr_cache_line;
wire [ 2:0] data_real_wr_size;
wire [ 7:0] data_real_wr_len ;
reg [127:0] write_buffer_data;
reg [ 2:0] write_buffer_num;
wire write_buffer_last;
assign write_buffer_empty = (write_buffer_num == 3'b0) && !write_wait_enable;
assign rd_requst_can_receive = rd_requst_state_is_empty && !(write_wait_enable && !(bvalid && bready));
assign data_rd_rdy = rd_requst_can_receive;
assign inst_rd_rdy = !data_rd_req && rd_requst_can_receive;
//read type must be cache line
assign data_rd_cache_line = data_rd_type == 3'b100 ;
assign data_real_rd_size = data_rd_cache_line ? 3'b10 : data_rd_type;
assign data_real_rd_len = data_rd_cache_line ? 8'b11 : 8'b0 ;
assign inst_rd_cache_line = inst_rd_type == 3'b100 ;
assign inst_real_rd_size = inst_rd_cache_line ? 3'b10 : inst_rd_type;
assign inst_real_rd_len = inst_rd_cache_line ? 8'b11 : 8'b0 ;
//write size can be special
assign data_wr_cache_line = data_wr_type == 3'b100;
assign data_real_wr_size = data_wr_cache_line ? 3'b10 : data_wr_type;
assign data_real_wr_len = data_wr_cache_line ? 8'b11 : 8'b0 ;
assign inst_ret_valid = !rid[0] && rvalid;
assign inst_ret_last = !rid[0] && rlast;
assign inst_ret_data = rdata; //this signal needed buffer???
assign data_ret_valid = rid[0] && rvalid;
assign data_ret_last = rid[0] && rlast;
assign data_ret_data = rdata;
assign data_wr_rdy = (write_requst_state == write_request_empty);
assign write_buffer_last = write_buffer_num == 3'b1;
always @(posedge clk) begin
if (reset) begin
read_requst_state <= read_requst_empty;
arvalid <= 1'b0;
end
else case (read_requst_state)
read_requst_empty: begin
if (data_rd_req) begin
if (write_wait_enable) begin
if (bvalid && bready) begin //when wait write back, stop send read request. easiest way.
read_requst_state <= read_requst_ready;
arid <= 4'b1;
araddr <= data_rd_addr;
arsize <= data_real_rd_size;
arlen <= data_real_rd_len;
arvalid <= 1'b1;
end
end
else begin
read_requst_state <= read_requst_ready;
arid <= 4'b1;
araddr <= data_rd_addr;
arsize <= data_real_rd_size;
arlen <= data_real_rd_len;
arvalid <= 1'b1;
end
end
else if (inst_rd_req) begin
if (write_wait_enable) begin
if (bvalid && bready) begin
read_requst_state <= read_requst_ready;
arid <= 4'b0;
araddr <= inst_rd_addr;
arsize <= inst_real_rd_size;
arlen <= inst_real_rd_len;
arvalid <= 1'b1;
end
end
else begin
read_requst_state <= read_requst_ready;
arid <= 4'b0;
araddr <= inst_rd_addr;
arsize <= inst_real_rd_size;
arlen <= inst_real_rd_len;
arvalid <= 1'b1;
end
end
end
read_requst_ready: begin
if (arready && arid[0]) begin
read_requst_state <= read_requst_empty;
arvalid <= 1'b0;
end
else if (arready && !arid[0]) begin
read_requst_state <= read_requst_empty;
arvalid <= 1'b0;
end
end
endcase
end
always @(posedge clk) begin
if (reset) begin
read_respond_state <= read_respond_empty;
rready <= 1'b1;
end
else case (read_respond_state)
read_respond_empty: begin
if (rvalid && rready) begin
read_respond_state <= read_respond_transfer;
end
end
read_respond_transfer: begin
if (rlast && rvalid) begin
read_respond_state <= read_respond_empty;
end
end
endcase
end
always @(posedge clk) begin
if (reset) begin
write_requst_state <= write_request_empty;
awvalid <= 1'b0;
wvalid <= 1'b0;
wlast <= 1'b0;
bready <= 1'b0;
write_buffer_num <= 3'b0;
write_buffer_data <= 128'b0;
end
else case (write_requst_state)
write_request_empty: begin
if (data_wr_req) begin
write_requst_state <= write_data_wait;
//end
awaddr <= data_wr_addr;
awsize <= data_real_wr_size;
awlen <= data_real_wr_len;
awvalid <= 1'b1;
wdata <= data_wr_data[31:0]; //from write 128 bit buffer
wstrb <= data_wr_wstrb;
write_buffer_data <= {32'b0, data_wr_data[127:32]};
if (data_wr_type == 3'b100) begin
write_buffer_num <= 3'b011;
end
else begin
write_buffer_num <= 3'b0;
wlast <= 1'b1;
end
end
end
write_data_wait: begin
if (awready) begin
write_requst_state <= write_data_transform;
awvalid <= 1'b0;
wvalid <= 1'b1;
end
end
write_data_transform: begin
if (wready) begin
if (wlast) begin
write_requst_state <= write_wait_b;
wvalid <= 1'b0;
wlast <= 1'b0;
bready <= 1'b1;
end
else begin
if (write_buffer_last) begin
wlast <= 1'b1;
end
write_requst_state <= write_data_transform;
wdata <= write_buffer_data[31:0];
wvalid <= 1'b1;
write_buffer_data <= {32'b0, write_buffer_data[127:32]};
write_buffer_num <= write_buffer_num - 3'b1;
end
end
end
write_wait_b: begin
if (bvalid && bready) begin
write_requst_state <= write_request_empty;
bready <= 1'b0;
end
end
default: begin
write_requst_state <= write_request_empty;
end
endcase
end
assign write_wait_enable = ~(write_requst_state == write_request_empty);
endmodule

238
rtl/ip/open-la500/btb.v Normal file
View File

@@ -0,0 +1,238 @@
module btb
#(
parameter BTBNUM = 32,
parameter RASNUM = 16
)
(
input clk ,
input reset ,
//from/to if
input [31:0] fetch_pc ,
input fetch_en ,
output [31:0] ret_pc ,
output taken ,
output ret_en ,
output [ 4:0] ret_index ,
//from id
input operate_en ,
input [31:0] operate_pc ,
input [ 4:0] operate_index ,
input pop_ras ,
input push_ras ,
input add_entry ,
input delete_entry ,
input pre_error ,
input pre_right ,
input target_error ,
input right_orien ,
input [31:0] right_target
);
/*
* btb_pc record all branch inst pc except jirl
* ras_pc only record jirl pc
*/
reg [29:0] btb_pc [BTBNUM-1:0];
reg [29:0] btb_target [BTBNUM-1:0];
reg [ 1:0] btb_counter [BTBNUM-1:0];
reg [BTBNUM-1:0] btb_valid;
reg [29:0] ras_pc [RASNUM-1:0];
reg [RASNUM-1:0] ras_valid;
reg [31:0] fetch_pc_r;
reg fetch_en_r;
//reg [BTBNUM-1:0] jirl_flag;
reg [29:0] ras [7:0];
reg [ 3:0] ras_ptr;
wire [29:0] ras_top;
wire ras_full;
wire ras_empty;
wire [31:0] btb_match_rd;
wire [15:0] ras_match_rd;
wire btb_match;
wire ras_match;
wire [29:0] btb_match_target;
wire [ 1:0] btb_match_counter;
wire [ 4:0] btb_match_index;
wire [ 4:0] btb_random_index;
wire [ 3:0] ras_match_index;
wire [ 3:0] ras_random_index;
wire btb_all_entry_valid;
wire [4:0] btb_select_one_invalid_entry;
wire [4:0] btb_add_entry_index;
reg [4:0] btb_add_entry_index_r;
wire [31:0] btb_add_entry_dec;
wire ras_all_entry_valid;
wire [3:0] ras_select_one_invalid_entry;
wire [3:0] ras_add_entry_index;
wire [31:0] btb_untaken_entry;
reg [31:0] btb_untaken_entry_r;
wire [31:0] btb_untaken_entry_t;
reg btb_add_entry_r;
wire [4:0] btb_sel_one_untaken_entry;
wire btb_has_one_untaken_entry;
reg [5:0] fcsr;
always @(posedge clk) begin
if (reset)
fetch_en_r <= 1'b0;
else
fetch_en_r <= fetch_en;
if (fetch_en)
fetch_pc_r <= fetch_pc;
end
always @(posedge clk) begin
btb_untaken_entry_r <= btb_untaken_entry;
btb_add_entry_r <= operate_en && !pop_ras && add_entry;
btb_add_entry_index_r <= btb_add_entry_index;
end
//untaken entry list cal at previous clock
assign btb_untaken_entry_t = btb_untaken_entry_r & ({32{!btb_add_entry_r}} | ~btb_add_entry_dec);
assign btb_has_one_untaken_entry = |btb_untaken_entry_t;
decoder_5_32 dec_btb_add_entry (.in(btb_add_entry_index_r), .out(btb_add_entry_dec));
one_valid_32 sel_one_untaken_entry (.in(btb_untaken_entry_t), .out_en(btb_sel_one_untaken_entry));
//assign btb_random_index = (btb_match && (fcsr[4:0] == btb_match_index)) ? (fcsr[4:0]+1'b1) : fcsr[4:0];
//untaken entry can exit first, make no difference, except lose history infor.
assign btb_add_entry_index = !btb_all_entry_valid ? btb_select_one_invalid_entry :
btb_has_one_untaken_entry ? btb_sel_one_untaken_entry :
fcsr[4:0] ;
assign btb_all_entry_valid = &btb_valid;
one_valid_32 sel_one_btb_entry (.in(~btb_valid), .out_en(btb_select_one_invalid_entry));
//assign ras_random_index = (ras_match && (fcsr[3:0] == ras_match_index)) ? (fcsr[3:0]+1'b1) : fcsr[3:0];
assign ras_add_entry_index = ras_all_entry_valid ? fcsr[3:0] : ras_select_one_invalid_entry;
assign ras_all_entry_valid = &ras_valid;
one_valid_16 sel_one_ras_entry (.in(~ras_valid), .out_en(ras_select_one_invalid_entry));
always @(posedge clk) begin
if (reset) begin
btb_valid <= 32'b0;
ras_valid <= 16'b0;
end
else if (operate_en && !pop_ras) begin
if (add_entry) begin
btb_valid[btb_add_entry_index] <= 1'b1;
btb_pc[btb_add_entry_index] <= operate_pc[31:2];
btb_target[btb_add_entry_index] <= right_target[31:2];
btb_counter[btb_add_entry_index] <= 2'b10;
end
else if (target_error) begin
btb_target[operate_index] <= right_target[31:2];
btb_counter[operate_index] <= 2'b10;
end
else if (pre_error || pre_right) begin
if (right_orien) begin
if (btb_counter[operate_index] != 2'b11) begin
btb_counter[operate_index] <= btb_counter[operate_index] + 2'b1;
end
end
else begin
if (btb_counter[operate_index] != 2'b00) begin
btb_counter[operate_index] <= btb_counter[operate_index] - 2'b1;
end
end
end
end
else if (operate_en && pop_ras) begin
if (add_entry) begin
ras_valid[ras_add_entry_index] <= 1'b1;
ras_pc[ras_add_entry_index] <= operate_pc[31:2];
end
end
end
genvar i;
generate
for (i = 0; i < BTBNUM; i = i + 1)
begin: btb_match_com
assign btb_match_rd[i] = fetch_en_r && ((fetch_pc_r[31:2] == btb_pc[i]) && btb_valid[i]);
end
endgenerate
generate
for (i = 0; i < RASNUM; i = i + 1)
begin: ras_match_com
assign ras_match_rd[i] = fetch_en_r && ((fetch_pc_r[31:2] == ras_pc[i]) && ras_valid[i]);
end
endgenerate
generate
for (i = 0; i < BTBNUM; i = i + 1)
begin: sel_untaken_entry
assign btb_untaken_entry[i] = btb_valid[i] && ~|btb_counter[i];
end
endgenerate
assign btb_match = |btb_match_rd;
assign ras_match = |ras_match_rd && !ras_empty;
assign ras_top = ras[ras_ptr - 4'b1]; //ras modify may before inst fetch
encoder_32_5 encode_btb_match (.in(btb_match_rd), .out(btb_match_index));
encoder_16_4 encode_ras_match (.in(ras_match_rd), .out(ras_match_index));
assign btb_match_target = btb_target[btb_match_index];
assign btb_match_counter = btb_counter[btb_match_index];
assign ret_pc = {32{ras_match}} & {ras_top, 2'b0} |
{32{btb_match}} & {btb_match_target, 2'b0};
assign ret_en = btb_match || ras_match;
assign taken = btb_match && btb_match_counter[1] || ras_match;
assign ret_index = {5{btb_match}} & {btb_match_index} |
{5{ras_match}} & {1'b0,ras_match_index};
assign ras_full = ras_ptr[3];
assign ras_empty = (ras_ptr == 4'd0);
always @(posedge clk) begin
if (reset) begin
ras_ptr <= 4'b0;
end
else if (operate_en) begin
if (push_ras && !ras_full) begin
ras[ras_ptr] <= operate_pc[31:2] + 30'b1;
ras_ptr <= ras_ptr + 4'b1;
end
else if (pop_ras && !ras_empty) begin
ras_ptr <= ras_ptr - 4'b1;
end
end
end
always @(posedge clk) begin
if (reset) begin
fcsr <= 6'b100010;
end
else begin
fcsr[0] <= fcsr[5];
fcsr[1] <= fcsr[0];
fcsr[2] <= fcsr[1];
fcsr[3] <= fcsr[2] ^ fcsr[5];
fcsr[4] <= fcsr[3] ^ fcsr[5];
fcsr[5] <= fcsr[4];
end
end
endmodule

73
rtl/ip/open-la500/csr.h Normal file
View File

@@ -0,0 +1,73 @@
//CRMD
`define PLV 1:0
`define IE 2
`define DA 3
`define PG 4
`define DATF 6:5
`define DATM 8:7
//PRMD
`define PPLV 1:0
`define PIE 2
//ECTL
`define LIE 12:0
`define LIE_1 9:0
`define LIE_2 12:11
//ESTAT
`define IS 12:0
`define ECODE 21:16
`define ESUBCODE 30:22
//TLBIDX
`define INDEX 4:0
`define PS 29:24
`define NE 31
//TLBEHI
`define VPPN 31:13
//TLBELO
`define TLB_V 0
`define TLB_D 1
`define TLB_PLV 3:2
`define TLB_MAT 5:4
`define TLB_G 6
`define TLB_PPN 31:8
`define TLB_PPN_EN 27:8 //todo
//ASID
`define TLB_ASID 9:0
//CPUID
`define COREID 8:0
//LLBCTL
`define ROLLB 0
`define WCLLB 1
`define KLO 2
//TCFG
`define EN 0
`define PERIODIC 1
`define INITVAL 31:2
//TICLR
`define CLR 0
//TLBRENTRY
`define TLBRENTRY_PA 31:6
//DMW
`define PLV0 0
`define PLV3 3
`define DMW_MAT 5:4
`define PSEG 27:25
`define VSEG 31:29
//PGDL PGDH PGD
`define BASE 31:12
`define ECODE_INT 6'h0
`define ECODE_PIL 6'h1
`define ECODE_PIS 6'h2
`define ECODE_PIF 6'h3
`define ECODE_PME 6'h4
`define ECODE_PPI 6'h7
`define ECODE_ADEF 6'h8
`define ECODE_ALE 6'h9
`define ECODE_SYS 6'hb
`define ECODE_BRK 6'hc
`define ECODE_INE 6'hd
`define ECODE_IPE 6'he
`define ECODE_FPD 6'hf
`define ECODE_TLBR 6'h3f
`define ESUBCODE_ADEF 9'h0

856
rtl/ip/open-la500/csr.v Normal file
View File

@@ -0,0 +1,856 @@
`include "mycpu.h"
`include "csr.h"
module csr
#(
parameter TLBNUM = 32
)
(
input clk ,
input reset ,
//from to ds
input [13:0] rd_addr ,
output [31:0] rd_data ,
//timer 64
output [63:0] timer_64_out ,
output [31:0] tid_out ,
//from ws
input csr_wr_en ,
input [13:0] wr_addr ,
input [31:0] wr_data ,
//interrupt
input [ 7:0] interrupt ,
output has_int ,
//from ws
input excp_flush ,
input ertn_flush ,
input [31:0] era_in ,
input [ 8:0] esubcode_in ,
input [ 5:0] ecode_in ,
input va_error_in ,
input [31:0] bad_va_in ,
input tlbsrch_en ,
input tlbsrch_found ,
input [ 4:0] tlbsrch_index ,
input excp_tlbrefill,
input excp_tlb ,
input [18:0] excp_tlb_vppn,
//from ws llbit
input llbit_in ,
input llbit_set_in ,
input [27:0] lladdr_in ,
input lladdr_set_in,
//to es
output llbit_out ,
output [18:0] vppn_out ,
//to ms
output [27:0] lladdr_out ,
//to fs
output [31:0] eentry_out ,
output [31:0] era_out ,
output [31:0] tlbrentry_out,
output disable_cache_out,
//to addr trans
output [ 9:0] asid_out ,
output [ 4:0] rand_index ,
output [31:0] tlbehi_out ,
output [31:0] tlbelo0_out ,
output [31:0] tlbelo1_out ,
output [31:0] tlbidx_out ,
output pg_out ,
output da_out ,
output [31:0] dmw0_out ,
output [31:0] dmw1_out ,
output [ 1:0] datf_out ,
output [ 1:0] datm_out ,
output [ 5:0] ecode_out ,
//from addr trans
input tlbrd_en ,
input [31:0] tlbehi_in ,
input [31:0] tlbelo0_in ,
input [31:0] tlbelo1_in ,
input [31:0] tlbidx_in ,
input [ 9:0] asid_in ,
//general use
output [ 1:0] plv_out
// csr regs for diff
`ifdef DIFFTEST_EN
,
output [31:0] csr_crmd_diff,
output [31:0] csr_prmd_diff,
output [31:0] csr_ectl_diff,
output [31:0] csr_estat_diff,
output [31:0] csr_era_diff,
output [31:0] csr_badv_diff,
output [31:0] csr_eentry_diff,
output [31:0] csr_tlbidx_diff,
output [31:0] csr_tlbehi_diff,
output [31:0] csr_tlbelo0_diff,
output [31:0] csr_tlbelo1_diff,
output [31:0] csr_asid_diff,
output [31:0] csr_save0_diff,
output [31:0] csr_save1_diff,
output [31:0] csr_save2_diff,
output [31:0] csr_save3_diff,
output [31:0] csr_tid_diff,
output [31:0] csr_tcfg_diff,
output [31:0] csr_tval_diff,
output [31:0] csr_ticlr_diff,
output [31:0] csr_llbctl_diff,
output [31:0] csr_tlbrentry_diff,
output [31:0] csr_dmw0_diff,
output [31:0] csr_dmw1_diff,
output [31:0] csr_pgdl_diff,
output [31:0] csr_pgdh_diff
`endif
);
localparam CRMD = 14'h0;
localparam PRMD = 14'h1;
localparam ECTL = 14'h4;
localparam ESTAT = 14'h5;
localparam ERA = 14'h6;
localparam BADV = 14'h7;
localparam EENTRY = 14'hc;
localparam TLBIDX= 14'h10;
localparam TLBEHI= 14'h11;
localparam TLBELO0=14'h12;
localparam TLBELO1=14'h13;
localparam ASID = 14'h18;
localparam PGDL = 14'h19;
localparam PGDH = 14'h1a;
localparam PGD = 14'h1b;
localparam CPUID = 14'h20;
localparam SAVE0 = 14'h30;
localparam SAVE1 = 14'h31;
localparam SAVE2 = 14'h32;
localparam SAVE3 = 14'h33;
localparam TID = 14'h40;
localparam TCFG = 14'h41;
localparam TVAL = 14'h42;
localparam CNTC = 14'h43;
localparam TICLR = 14'h44;
localparam LLBCTL= 14'h60;
localparam TLBRENTRY = 14'h88;
localparam DMW0 = 14'h180;
localparam DMW1 = 14'h181;
localparam BRK = 14'h100;
localparam DISABLE_CACHE = 14'h101;
localparam CPUCFG_1 = 14'hb1;
localparam CPUCFG_2 = 14'hb2;
localparam CPUCFG_10 = 14'hc0;
localparam CPUCFG_11 = 14'hc1;
localparam CPUCFG_12 = 14'hc2;
localparam CPUCFG_13 = 14'hc3;
wire crmd_wen = csr_wr_en & (wr_addr == CRMD);
wire prmd_wen = csr_wr_en & (wr_addr == PRMD);
wire ectl_wen = csr_wr_en & (wr_addr == ECTL);
wire estat_wen = csr_wr_en & (wr_addr == ESTAT);
wire era_wen = csr_wr_en & (wr_addr == ERA);
wire badv_wen = csr_wr_en & (wr_addr == BADV);
wire eentry_wen = csr_wr_en & (wr_addr == EENTRY);
wire tlbidx_wen = csr_wr_en & (wr_addr == TLBIDX);
wire tlbehi_wen = csr_wr_en & (wr_addr == TLBEHI);
wire tlbelo0_wen= csr_wr_en & (wr_addr == TLBELO0);
wire tlbelo1_wen= csr_wr_en & (wr_addr == TLBELO1);
wire asid_wen = csr_wr_en & (wr_addr == ASID);
wire pgdl_wen = csr_wr_en & (wr_addr == PGDL);
wire pgdh_wen = csr_wr_en & (wr_addr == PGDH);
wire pgd_wen = csr_wr_en & (wr_addr == PGD);
wire cpuid_wen = csr_wr_en & (wr_addr == CPUID);
wire save0_wen = csr_wr_en & (wr_addr == SAVE0);
wire save1_wen = csr_wr_en & (wr_addr == SAVE1);
wire save2_wen = csr_wr_en & (wr_addr == SAVE2);
wire save3_wen = csr_wr_en & (wr_addr == SAVE3);
wire tid_wen = csr_wr_en & (wr_addr == TID);
wire tcfg_wen = csr_wr_en & (wr_addr == TCFG);
wire tval_wen = csr_wr_en & (wr_addr == TVAL);
wire cntc_wen = csr_wr_en & (wr_addr == CNTC);
wire ticlr_wen = csr_wr_en & (wr_addr == TICLR);
wire llbctl_wen = csr_wr_en & (wr_addr == LLBCTL);
wire tlbrentry_wen = csr_wr_en & (wr_addr == TLBRENTRY);
wire DMW0_wen = csr_wr_en & (wr_addr == DMW0);
wire DMW1_wen = csr_wr_en & (wr_addr == DMW1);
wire BRK_wen = csr_wr_en & (wr_addr == BRK);
wire disable_cache_wen = csr_wr_en & (wr_addr == DISABLE_CACHE);
reg [31:0] csr_crmd;
reg [31:0] csr_prmd;
reg [31:0] csr_ectl;
reg [31:0] csr_estat;
reg [31:0] csr_era;
reg [31:0] csr_badv;
reg [31:0] csr_eentry;
reg [31:0] csr_tlbidx;
reg [31:0] csr_tlbehi;
reg [31:0] csr_tlbelo0;
reg [31:0] csr_tlbelo1;
reg [31:0] csr_asid;
reg [31:0] csr_cpuid;
reg [31:0] csr_save0;
reg [31:0] csr_save1;
reg [31:0] csr_save2;
reg [31:0] csr_save3;
reg [31:0] csr_tid;
reg [31:0] csr_tcfg;
reg [31:0] csr_tval;
reg [31:0] csr_cntc;
reg [31:0] csr_ticlr;
reg [31:0] csr_llbctl;
reg [31:0] csr_tlbrentry;
reg [31:0] csr_dmw0;
reg [31:0] csr_dmw1;
reg [31:0] csr_pgdl;
reg [31:0] csr_pgdh;
reg [31:0] csr_brk;
reg [31:0] csr_disable_cache;
reg [31:0] csr_cpucfg1;
reg [31:0] csr_cpucfg2;
reg [31:0] csr_cpucfg10;
reg [31:0] csr_cpucfg11;
reg [31:0] csr_cpucfg12;
reg [31:0] csr_cpucfg13;
wire [31:0] csr_pgd;
reg timer_en;
reg [63:0] timer_64;
reg llbit;
reg [27:0] lladdr;
wire tlbrd_valid_wr_en;
wire tlbrd_invalid_wr_en;
wire eret_tlbrefill_excp;
wire no_forward;
assign csr_pgd = csr_badv[31] ? csr_pgdh : csr_pgdl;
assign eret_tlbrefill_excp = csr_estat[`ECODE] == 6'h3f;
assign tlbrd_valid_wr_en = tlbrd_en && !tlbidx_in[`NE];
assign tlbrd_invalid_wr_en = tlbrd_en && tlbidx_in[`NE];
assign has_int = ((csr_ectl[`LIE] & csr_estat[`IS]) != 13'b0) & csr_crmd[`IE];
assign eentry_out = csr_eentry;
assign era_out = csr_era;
assign timer_64_out = timer_64 + {{32{csr_cntc[31]}}, csr_cntc};
assign tid_out = csr_tid;
assign llbit_out = llbit;
assign lladdr_out = lladdr;
assign asid_out = csr_asid[`TLB_ASID];
assign vppn_out = (csr_wr_en && wr_addr == TLBEHI) ? wr_data[`VPPN] : csr_tlbehi[`VPPN];
assign tlbehi_out = csr_tlbehi;
assign tlbelo0_out = csr_tlbelo0;
assign tlbelo1_out = csr_tlbelo1;
assign tlbidx_out = csr_tlbidx;
assign rand_index = timer_64[4:0];
assign disable_cache_out = csr_disable_cache[0];
//forward to if stage
assign no_forward = !excp_tlbrefill && !(eret_tlbrefill_excp && ertn_flush) && !crmd_wen;
assign pg_out = excp_tlbrefill & 1'b0 |
(eret_tlbrefill_excp && ertn_flush) & 1'b1 |
crmd_wen & wr_data[`PG] |
no_forward & csr_crmd[`PG];
assign da_out = excp_tlbrefill & 1'b1 |
(eret_tlbrefill_excp && ertn_flush) & 1'b0 |
crmd_wen & wr_data[`DA] |
no_forward & csr_crmd[`DA];
assign dmw0_out = DMW0_wen ? wr_data : csr_dmw0;
assign dmw1_out = DMW1_wen ? wr_data : csr_dmw1;
assign plv_out = {2{excp_flush}} & 2'b0 |
{2{ertn_flush}} & csr_prmd[`PPLV] |
{2{crmd_wen }} & wr_data[`PLV] |
{2{!excp_flush && !ertn_flush && !crmd_wen}} & csr_crmd[`PLV];
assign tlbrentry_out= csr_tlbrentry;
assign datf_out = csr_crmd[`DATF];
assign datm_out = csr_crmd[`DATM];
assign ecode_out = csr_estat[`ECODE];
assign rd_data = {32{rd_addr == CRMD }} & csr_crmd |
{32{rd_addr == PRMD }} & csr_prmd |
{32{rd_addr == ECTL }} & csr_ectl |
{32{rd_addr == ESTAT }} & csr_estat |
{32{rd_addr == ERA }} & csr_era |
{32{rd_addr == BADV }} & csr_badv |
{32{rd_addr == EENTRY}} & csr_eentry |
{32{rd_addr == TLBIDX}} & csr_tlbidx |
{32{rd_addr == TLBEHI}} & csr_tlbehi |
{32{rd_addr == TLBELO0}} & csr_tlbelo0 |
{32{rd_addr == TLBELO1}} & csr_tlbelo1 |
{32{rd_addr == ASID }} & csr_asid |
{32{rd_addr == PGDL }} & csr_pgdl |
{32{rd_addr == PGDH }} & csr_pgdh |
{32{rd_addr == PGD }} & csr_pgd |
{32{rd_addr == CPUID }} & csr_cpuid |
{32{rd_addr == SAVE0 }} & csr_save0 |
{32{rd_addr == SAVE1 }} & csr_save1 |
{32{rd_addr == SAVE2 }} & csr_save2 |
{32{rd_addr == SAVE3 }} & csr_save3 |
{32{rd_addr == TID }} & csr_tid |
{32{rd_addr == TCFG }} & csr_tcfg |
{32{rd_addr == CNTC }} & csr_cntc |
{32{rd_addr == TICLR }} & csr_ticlr |
{32{rd_addr == LLBCTL}} & {csr_llbctl[31:1], llbit} |
{32{rd_addr == TVAL }} & csr_tval |
{32{rd_addr == TLBRENTRY}} & csr_tlbrentry |
{32{rd_addr == DMW0}} & csr_dmw0 |
{32{rd_addr == DMW1}} & csr_dmw1 |
{32{rd_addr == CPUCFG_1 }} & csr_cpucfg1 |
{32{rd_addr == CPUCFG_2 }} & csr_cpucfg2 |
{32{rd_addr == CPUCFG_10 }} & csr_cpucfg10 |
{32{rd_addr == CPUCFG_11 }} & csr_cpucfg11 |
{32{rd_addr == CPUCFG_12 }} & csr_cpucfg12 |
{32{rd_addr == CPUCFG_13 }} & csr_cpucfg13 ;
//crmd
always @(posedge clk) begin
if (reset) begin
csr_crmd[ `PLV] <= 2'b0;
csr_crmd[ `IE] <= 1'b0;
csr_crmd[ `DA] <= 1'b1;
csr_crmd[ `PG] <= 1'b0;
csr_crmd[`DATF] <= 2'b0;
csr_crmd[`DATM] <= 2'b0;
csr_crmd[31: 9] <= 23'b0;
end
else if (excp_flush) begin
csr_crmd[ `PLV] <= 2'b0;
csr_crmd[ `IE] <= 1'b0;
if (excp_tlbrefill) begin
csr_crmd [`DA] <= 1'b1;
csr_crmd [`PG] <= 1'b0;
end
end
else if (ertn_flush) begin
csr_crmd[ `PLV] <= csr_prmd[`PPLV];
csr_crmd[ `IE] <= csr_prmd[`PIE ];
if (eret_tlbrefill_excp) begin
csr_crmd[`DA] <= 1'b0;
csr_crmd[`PG] <= 1'b1;
end
end
else if (crmd_wen) begin
csr_crmd[ `PLV] <= wr_data[ `PLV];
csr_crmd[ `IE] <= wr_data[ `IE];
csr_crmd[ `DA] <= wr_data[ `DA];
csr_crmd[ `PG] <= wr_data[ `PG];
csr_crmd[`DATF] <= wr_data[`DATF];
csr_crmd[`DATM] <= wr_data[`DATM];
end
end
//prmd
always @(posedge clk) begin
if (reset) begin
csr_prmd[31:3] <= 29'b0;
end
else if (excp_flush) begin
csr_prmd[`PPLV] <= csr_crmd[`PLV];
csr_prmd[ `PIE] <= csr_crmd[`IE ];
end
else if (prmd_wen) begin
csr_prmd[`PPLV] <= wr_data[`PPLV];
csr_prmd[ `PIE] <= wr_data[ `PIE];
end
end
//ectl
always @(posedge clk) begin
if (reset) begin
csr_ectl <= 32'b0;
end
else if (ectl_wen) begin
csr_ectl[ `LIE_1] <= wr_data[ `LIE_1];
csr_ectl[ `LIE_2] <= wr_data[ `LIE_2];
end
end
//estat
always @(posedge clk) begin
if (reset) begin
csr_estat[ 1: 0] <= 2'b0;
csr_estat[10] <= 1'b0;
csr_estat[12] <= 1'b0;
csr_estat[15:13] <= 3'b0;
csr_estat[31] <= 1'b0;
csr_estat[21:16] <= 6'b0;
timer_en <= 1'b0;
end
else begin
if (ticlr_wen && wr_data[`CLR]) begin
csr_estat[11] <= 1'b0;
end
else if (tcfg_wen) begin
timer_en <= wr_data[`EN];
end
else if (timer_en && (csr_tval == 32'b0)) begin
csr_estat[11] <= 1'b1;
timer_en <= csr_tcfg[`PERIODIC];
end
csr_estat[9:2] <= interrupt;
if (excp_flush) begin
csr_estat[ `ECODE] <= ecode_in;
csr_estat[`ESUBCODE] <= esubcode_in;
end
else if (estat_wen) begin
csr_estat[ 1:0] <= wr_data[ 1:0];
end
end
end
//era
always @(posedge clk) begin
if (excp_flush) begin
csr_era <= era_in;
end
else if (era_wen) begin
csr_era <= wr_data;
end
end
//badv
always @(posedge clk) begin
if (badv_wen) begin
csr_badv <= wr_data;
end
else if (va_error_in) begin
csr_badv <= bad_va_in;
end
end
//eentry
always @(posedge clk) begin
if (reset) begin
csr_eentry[5:0] <= 6'b0;
end
else if (eentry_wen) begin
csr_eentry[31:6] <= wr_data[31:6];
end
end
//tlbidx
always @(posedge clk) begin
if (reset) begin
csr_tlbidx[23: 5] <= 19'b0;
csr_tlbidx[30] <= 1'b0;
csr_tlbidx[`INDEX]<= 5'b0;
end
else if (tlbidx_wen) begin
csr_tlbidx[$clog2(TLBNUM)-1:0] <= wr_data[$clog2(TLBNUM)-1:0];
csr_tlbidx[`PS] <= wr_data[`PS];
csr_tlbidx[`NE] <= wr_data[`NE];
end
else if (tlbsrch_en) begin
if (tlbsrch_found) begin
csr_tlbidx[`INDEX] <= tlbsrch_index;
csr_tlbidx[`NE] <= 1'b0;
end
else begin
csr_tlbidx[`NE] <= 1'b1;
end
end
else if (tlbrd_valid_wr_en) begin
csr_tlbidx[`PS] <= tlbidx_in[`PS];
csr_tlbidx[`NE] <= tlbidx_in[`NE];
end
else if (tlbrd_invalid_wr_en) begin
csr_tlbidx[`PS] <= 6'b0;
csr_tlbidx[`NE] <= tlbidx_in[`NE];
end
end
//tlbehi
always @(posedge clk) begin
if (reset) begin
csr_tlbehi[12:0] <= 13'b0;
end
else if (tlbehi_wen) begin
csr_tlbehi[`VPPN] <= wr_data[`VPPN];
end
else if (tlbrd_valid_wr_en) begin
csr_tlbehi[`VPPN] <= tlbehi_in[`VPPN];
end
else if (tlbrd_invalid_wr_en) begin
csr_tlbehi[`VPPN] <= 19'b0;
end
else if (excp_tlb) begin
csr_tlbehi[`VPPN] <= excp_tlb_vppn;
end
end
//tlbelo0
always @(posedge clk) begin
if (reset) begin
csr_tlbelo0[7] <= 1'b0;
end
else if (tlbelo0_wen) begin
csr_tlbelo0[`TLB_V] <= wr_data[`TLB_V];
csr_tlbelo0[`TLB_D] <= wr_data[`TLB_D];
csr_tlbelo0[`TLB_PLV] <= wr_data[`TLB_PLV];
csr_tlbelo0[`TLB_MAT] <= wr_data[`TLB_MAT];
csr_tlbelo0[`TLB_G] <= wr_data[`TLB_G];
csr_tlbelo0[`TLB_PPN_EN] <= wr_data[`TLB_PPN_EN];
end
else if (tlbrd_valid_wr_en) begin
csr_tlbelo0[`TLB_V] <= tlbelo0_in[`TLB_V];
csr_tlbelo0[`TLB_D] <= tlbelo0_in[`TLB_D];
csr_tlbelo0[`TLB_PLV] <= tlbelo0_in[`TLB_PLV];
csr_tlbelo0[`TLB_MAT] <= tlbelo0_in[`TLB_MAT];
csr_tlbelo0[`TLB_G] <= tlbelo0_in[`TLB_G];
csr_tlbelo0[`TLB_PPN_EN] <= tlbelo0_in[`TLB_PPN_EN];
end
else if (tlbrd_invalid_wr_en) begin
csr_tlbelo0[`TLB_V] <= 1'b0;
csr_tlbelo0[`TLB_D] <= 1'b0;
csr_tlbelo0[`TLB_PLV] <= 2'b0;
csr_tlbelo0[`TLB_MAT] <= 2'b0;
csr_tlbelo0[`TLB_G] <= 1'b0;
csr_tlbelo0[`TLB_PPN_EN] <= 20'b0;
end
end
//tlbelo1
always @(posedge clk) begin
if (reset) begin
csr_tlbelo1[7] <= 1'b0;
end
else if (tlbelo1_wen) begin
csr_tlbelo1[`TLB_V] <= wr_data[`TLB_V];
csr_tlbelo1[`TLB_D] <= wr_data[`TLB_D];
csr_tlbelo1[`TLB_PLV] <= wr_data[`TLB_PLV];
csr_tlbelo1[`TLB_MAT] <= wr_data[`TLB_MAT];
csr_tlbelo1[`TLB_G] <= wr_data[`TLB_G];
csr_tlbelo1[`TLB_PPN_EN] <= wr_data[`TLB_PPN_EN];
end
else if (tlbrd_valid_wr_en) begin
csr_tlbelo1[`TLB_V] <= tlbelo1_in[`TLB_V];
csr_tlbelo1[`TLB_D] <= tlbelo1_in[`TLB_D];
csr_tlbelo1[`TLB_PLV] <= tlbelo1_in[`TLB_PLV];
csr_tlbelo1[`TLB_MAT] <= tlbelo1_in[`TLB_MAT];
csr_tlbelo1[`TLB_G] <= tlbelo1_in[`TLB_G];
csr_tlbelo1[`TLB_PPN_EN] <= tlbelo1_in[`TLB_PPN_EN];
end
else if (tlbrd_invalid_wr_en) begin
csr_tlbelo1[`TLB_V] <= 1'b0;
csr_tlbelo1[`TLB_D] <= 1'b0;
csr_tlbelo1[`TLB_PLV] <= 2'b0;
csr_tlbelo1[`TLB_MAT] <= 2'b0;
csr_tlbelo1[`TLB_G] <= 1'b0;
csr_tlbelo1[`TLB_PPN_EN] <= 20'b0;
end
end
//asid
always @(posedge clk) begin
if (reset) begin
csr_asid[31:10] <= 22'h280; //ASIDBITS = 10
end
else if (asid_wen) begin
csr_asid[`TLB_ASID] <= wr_data[`TLB_ASID];
end
else if (tlbrd_valid_wr_en) begin
csr_asid[`TLB_ASID] <= asid_in;
end
else if (tlbrd_invalid_wr_en) begin
csr_asid[`TLB_ASID] <= 10'b0;
end
end
//TLBRENTRY
always @(posedge clk) begin
if (reset) begin
csr_tlbrentry[5:0] <= 6'b0;
end
else if (tlbrentry_wen) begin
csr_tlbrentry[`TLBRENTRY_PA] <= wr_data[`TLBRENTRY_PA];
end
end
//dmw0
always @(posedge clk) begin
if (reset) begin
csr_dmw0[ 2:1] <= 2'b0;
csr_dmw0[24:6] <= 19'b0;
csr_dmw0[28] <= 1'b0;
end
else if (DMW0_wen) begin
csr_dmw0[`PLV0] <= wr_data[`PLV0];
csr_dmw0[`PLV3] <= wr_data[`PLV3];
csr_dmw0[`DMW_MAT] <= wr_data[`DMW_MAT];
csr_dmw0[`PSEG] <= wr_data[`PSEG];
csr_dmw0[`VSEG] <= wr_data[`VSEG];
end
end
//dmw1
always @(posedge clk) begin
if (reset) begin
csr_dmw1[ 2:1] <= 2'b0;
csr_dmw1[24:6] <= 19'b0;
csr_dmw1[28] <= 1'b0;
end
else if (DMW1_wen) begin
csr_dmw1[`PLV0] <= wr_data[`PLV0];
csr_dmw1[`PLV3] <= wr_data[`PLV3];
csr_dmw1[`DMW_MAT] <= wr_data[`DMW_MAT];
csr_dmw1[`PSEG] <= wr_data[`PSEG];
csr_dmw1[`VSEG] <= wr_data[`VSEG];
end
end
//cpuid
always @(posedge clk) begin
if (reset) begin
csr_cpuid <= 32'b0;
end
end
//save0
always @(posedge clk) begin
if (save0_wen) begin
csr_save0 <= wr_data;
end
end
//save1
always @(posedge clk) begin
if (save1_wen) begin
csr_save1 <= wr_data;
end
end
//save2
always @(posedge clk) begin
if (save2_wen) begin
csr_save2 <= wr_data;
end
end
//save3
always @(posedge clk) begin
if (save3_wen) begin
csr_save3 <= wr_data;
end
end
//tid
always @(posedge clk) begin
if (reset) begin
csr_tid <= 32'b0;
end
else if (tid_wen) begin
csr_tid <= wr_data;
end
end
//tcfg
always @(posedge clk) begin
if (reset) begin
csr_tcfg[`EN] <= 1'b0;
end
else if (tcfg_wen) begin
csr_tcfg[ `EN] <= wr_data[ `EN];
csr_tcfg[`PERIODIC] <= wr_data[`PERIODIC];
csr_tcfg[ `INITVAL] <= wr_data[ `INITVAL];
end
end
//cntc
always @(posedge clk) begin
if (reset) begin
csr_cntc <= 32'b0;
end
else if (cntc_wen) begin
csr_cntc <= wr_data;
end
end
//tval
always @(posedge clk) begin
if (tcfg_wen) begin
csr_tval <= {wr_data[ `INITVAL], 2'b0};
end
else if (timer_en) begin
if (csr_tval != 32'b0) begin
csr_tval <= csr_tval - 32'b1;
end
else if (csr_tval == 32'b0) begin
csr_tval <= csr_tcfg[`PERIODIC] ? {csr_tcfg[`INITVAL], 2'b0} : 32'hffffffff;
end
end
end
//ticlr
always @(posedge clk) begin
if (reset) begin
csr_ticlr <= 32'b0;
end
end
//llbctl
always @(posedge clk) begin
if (reset) begin
csr_llbctl[`KLO] <= 1'b0;
csr_llbctl[31:3] <= 29'b0;
csr_llbctl[`WCLLB] <= 1'b0;
llbit <= 1'b0;
end
else if (ertn_flush) begin
if (csr_llbctl[`KLO]) begin
csr_llbctl[`KLO] <= 1'b0;
end
else begin
llbit <= 1'b0;
end
end
else if (llbctl_wen) begin
csr_llbctl[ `KLO] <= wr_data[ `KLO];
if (wr_data[`WCLLB] == 1'b1) begin
llbit <= 1'b0;
end
end
else if (llbit_set_in) begin
llbit <= llbit_in;
end
end
always @(posedge clk) begin
if (reset) begin
lladdr <= 28'b0;
end
else if (lladdr_set_in) begin
lladdr <= lladdr_in;
end
end
//timer_64
always @(posedge clk) begin
if (reset) begin
timer_64 <= 64'b0;
end
else begin
timer_64 <= timer_64 + 1'b1;
end
end
//pgdl
always @(posedge clk) begin
if (pgdl_wen) begin
csr_pgdl[`BASE] <= wr_data[`BASE];
end
end
//pgdh
always @(posedge clk) begin
if (pgdh_wen) begin
csr_pgdh[`BASE] <= wr_data[`BASE];
end
end
//use for break in chipscope in software
always @(posedge clk) begin
if (reset) begin
csr_brk <= 32'b0;
end
if (BRK_wen) begin
csr_brk <= wr_data;
end
end
//use for disable cache or enable cache
always @(posedge clk) begin
if (reset) begin
csr_disable_cache <= 32'b0;
end
if (disable_cache_wen) begin
csr_disable_cache <= wr_data;
end
end
//cpucfg1
always @(posedge clk) begin
if (reset) begin
csr_cpucfg1 <= 32'h1f1f4;
end
end
//cpucfg2
always @(posedge clk) begin
if (reset) begin
csr_cpucfg2 <= 32'h0;
end
end
//cpucfg10
always @(posedge clk) begin
if (reset) begin
csr_cpucfg10 <= 32'h5;
end
end
//cpucfg11
always @(posedge clk) begin
if (reset) begin
csr_cpucfg11 <= 32'h04080001;
end
end
//cpucfg12
always @(posedge clk) begin
if (reset) begin
csr_cpucfg12 <= 32'h04080001;
end
end
//cpucfg13
always @(posedge clk) begin
if (reset) begin
csr_cpucfg13 <= 32'h0;
end
end
// difftest
`ifdef DIFFTEST_EN
assign csr_crmd_diff = csr_crmd;
assign csr_prmd_diff = csr_prmd;
assign csr_ectl_diff = csr_ectl;
assign csr_estat_diff = csr_estat;
assign csr_era_diff = csr_era;
assign csr_badv_diff = csr_badv;
assign csr_eentry_diff = csr_eentry;
assign csr_tlbidx_diff = csr_tlbidx;
assign csr_tlbehi_diff = csr_tlbehi;
assign csr_tlbelo0_diff = csr_tlbelo0;
assign csr_tlbelo1_diff = csr_tlbelo1;
assign csr_asid_diff = csr_asid;
assign csr_save0_diff = csr_save0;
assign csr_save1_diff = csr_save1;
assign csr_save2_diff = csr_save2;
assign csr_save3_diff = csr_save3;
assign csr_tid_diff = csr_tid;
assign csr_tcfg_diff = csr_tcfg;
assign csr_tval_diff = csr_tval;
assign csr_ticlr_diff = csr_ticlr;
assign csr_llbctl_diff = {csr_llbctl[31:1], llbit};
assign csr_tlbrentry_diff = csr_tlbrentry;
assign csr_dmw0_diff = csr_dmw0;
assign csr_dmw1_diff = csr_dmw1;
assign csr_pgdl_diff = csr_pgdl;
assign csr_pgdh_diff = csr_pgdh;
`endif
endmodule

651
rtl/ip/open-la500/dcache.v Normal file
View File

@@ -0,0 +1,651 @@
module dcache
(
input clk ,
input reset ,
//to from cpu
input valid ,
input op , //cache inst treat as load, op is zero
input [ 2:0] size ,
input [ 7:0] index ,
input [19:0] tag ,
input [ 3:0] offset ,
input [ 3:0] wstrb ,
input [31:0] wdata ,
output addr_ok ,
output data_ok ,
output [31:0] rdata ,
input uncache_en ,
input dcacop_op_en ,
input [ 1:0] cacop_op_mode,
input [ 4:0] preld_hint ,
input preld_en ,
input tlb_excp_cancel_req,
input sc_cancel_req,
output dcache_empty ,
//to from axi
output rd_req ,
output [ 2:0] rd_type ,
output [31:0] rd_addr ,
input rd_rdy ,
input ret_valid ,
input ret_last ,
input [31:0] ret_data ,
output reg wr_req ,
output [ 2:0] wr_type ,
output [31:0] wr_addr ,
output [ 3:0] wr_wstrb ,
output [127:0] wr_data ,
input wr_rdy ,
//to perf_counter
output cache_miss
);
reg [1:0] way_d_reg [255:0];
wire request_uncache_en ;
reg request_buffer_op ;
reg request_buffer_preld ;
reg [ 2:0] request_buffer_size ;
reg [ 7:0] request_buffer_index ;
reg [19:0] request_buffer_tag ;
reg [ 3:0] request_buffer_offset ;
reg [ 3:0] request_buffer_wstrb ;
reg [31:0] request_buffer_wdata ;
reg request_buffer_uncache_en ;
reg request_buffer_dcacop ;
reg [ 1:0] request_buffer_cacop_op_mode;
reg [ 1:0] miss_buffer_replace_way ;
reg [ 1:0] miss_buffer_ret_num ;
wire [ 1:0] ret_num_add_one ;
reg [ 7:0] write_buffer_index ;
reg [ 3:0] write_buffer_wstrb ;
reg [31:0] write_buffer_wdata ;
reg [ 1:0] write_buffer_way ;
reg [ 3:0] write_buffer_offset ;
wire [ 7:0] way_bank_addra [1:0][3:0];
wire [31:0] way_bank_dina [1:0][3:0];
wire [31:0] way_bank_douta [1:0][3:0];
wire way_bank_ena [1:0][3:0];
wire [ 3:0] way_bank_wea [1:0][3:0];
wire [ 7:0] way_tagv_addra [1:0];
wire [20:0] way_tagv_dina [1:0];
wire [20:0] way_tagv_douta [1:0];
wire way_tagv_ena [1:0];
wire way_tagv_wea [1:0];
wire wr_match_way_bank[1:0][3:0];
wire [ 1:0] way_d ;
wire [ 1:0] way_hit ;
wire cache_hit ;
wire [31:0] way_load_word [1:0];
wire [127:0] way_data [1:0];
wire [31:0] load_res ;
wire [127:0] replace_data ;
wire replace_d ;
wire replace_v ;
wire [19:0] replace_tag ;
wire [ 1:0] random_val ;
wire [ 3:0] chosen_way ;
wire [ 1:0] replace_way ;
wire [ 1:0] invalid_way ;
wire has_invalid_way ;
wire [ 1:0] rand_repl_way ;
wire [ 3:0] cacop_chose_way ;
wire main_idle2lookup ;
wire main_lookup2lookup;
wire main_state_is_idle ;
wire main_state_is_lookup ;
wire main_state_is_miss ;
wire main_state_is_replace;
wire main_state_is_refill ;
wire write_state_is_idle;
wire write_state_is_full;
wire uncache_wr ;
reg uncache_wr_buffer;
wire [ 2:0] uncache_wr_type;
wire [ 1:0] way_wr_en;
wire [31:0] refill_data;
wire [31:0] write_in;
wire cacop_op_mode0;
wire cacop_op_mode1;
wire cacop_op_mode2;
wire cacop_op_mode2_hit_wr;
reg cacop_op_mode2_hit_wr_buffer;
wire preld_st_en;
wire preld_ld_en;
wire preld_ld_st_en;
wire req_or_inst_valid;
reg [1:0] lookup_way_hit_buffer;
localparam main_idle = 5'b00001;
localparam main_lookup = 5'b00010;
localparam main_miss = 5'b00100;
localparam main_replace = 5'b01000;
localparam main_refill = 5'b10000;
localparam write_buffer_idle = 1'b0;
localparam write_buffer_write = 1'b1;
genvar i,j;
reg [4:0] main_state;
reg write_buffer_state;
reg rd_req_buffer;
// wire invalid_way;
wire cancel_req = tlb_excp_cancel_req || sc_cancel_req;
//state machine
//main loop
always @(posedge clk) begin
if (reset) begin
main_state <= main_idle;
request_buffer_op <= 1'b0;
request_buffer_preld <= 1'b0;
request_buffer_size <= 3'b0;
request_buffer_index <= 8'b0;
request_buffer_tag <= 20'b0;
request_buffer_offset <= 4'b0;
request_buffer_wstrb <= 4'b0;
request_buffer_wdata <= 32'b0;
request_buffer_uncache_en <= 1'b0;
request_buffer_cacop_op_mode <= 2'b0;
request_buffer_dcacop <= 1'b0;
miss_buffer_replace_way <= 2'b0;
wr_req <= 1'b0;
end
else case (main_state)
main_idle: begin
if (req_or_inst_valid && main_idle2lookup) begin
main_state <= main_lookup;
request_buffer_op <= op ;
request_buffer_preld <= preld_en ;
request_buffer_size <= size ;
request_buffer_index <= index ;
request_buffer_offset <= offset ;
request_buffer_wstrb <= wstrb ;
request_buffer_wdata <= wdata ;
request_buffer_cacop_op_mode <= cacop_op_mode ;
request_buffer_dcacop <= dcacop_op_en ;
end
end
main_lookup: begin
if (req_or_inst_valid && main_lookup2lookup) begin
main_state <= main_lookup;
request_buffer_op <= op ;
request_buffer_preld <= preld_en ;
request_buffer_size <= size ;
request_buffer_index <= index ;
request_buffer_offset <= offset ;
request_buffer_wstrb <= wstrb ;
request_buffer_wdata <= wdata ;
request_buffer_cacop_op_mode <= cacop_op_mode ;
request_buffer_dcacop <= dcacop_op_en ;
end
else if (cancel_req) begin
main_state <= main_idle;
end
else if (!cache_hit) begin
//uncache wr --> wr_req 1
//uncache rd, cacop(code==0) --> wr_req 0
//cacop(code==1, 2), cache st, cache ld --> wr_req (dirty && valid)
if (uncache_wr || ((replace_d && replace_v) && (!request_uncache_en || cacop_op_mode2_hit_wr) && !cacop_op_mode0))
main_state <= main_miss;
else
main_state <= main_replace;
request_buffer_tag <= tag;
request_buffer_uncache_en <= request_uncache_en;
uncache_wr_buffer <= uncache_wr;
miss_buffer_replace_way <= replace_way;
cacop_op_mode2_hit_wr_buffer <= cacop_op_mode2_hit_wr;
end
else begin
main_state <= main_idle;
end
end
main_miss: begin
if (wr_rdy) begin
main_state <= main_replace;
wr_req <= 1'b1;
end
end
main_replace: begin
if (rd_rdy) begin
main_state <= main_refill;
miss_buffer_ret_num <= 2'b0; //when get ret data, it will be sent to cpu directly.
end
wr_req <= 1'b0;
end
main_refill: begin
if ((ret_valid && ret_last) || !rd_req_buffer) begin //when rd_req is not set, go to next state directly
main_state <= main_idle;
end
else begin
if (ret_valid) begin
miss_buffer_ret_num <= ret_num_add_one;
end
end
end
default: begin
main_state <= main_idle;
end
endcase
end
//hit write state
always @(posedge clk) begin
if (reset) begin
write_buffer_state <= write_buffer_idle;
write_buffer_index <= 8'b0;
write_buffer_wstrb <= 4'b0;
write_buffer_wdata <= 32'b0;
write_buffer_offset <= 4'b0;
write_buffer_way <= 2'b0;
end
else case (write_buffer_state)
write_buffer_idle: begin
if (main_state_is_lookup && cache_hit && request_buffer_op && !cancel_req) begin
write_buffer_state <= write_buffer_write;
write_buffer_index <= request_buffer_index;
write_buffer_wstrb <= request_buffer_wstrb;
write_buffer_wdata <= request_buffer_wdata;
write_buffer_offset <= request_buffer_offset;
write_buffer_way <= way_hit;
end
end
write_buffer_write: begin
if (main_state_is_lookup && cache_hit && request_buffer_op && !cancel_req) begin
write_buffer_state <= write_buffer_write;
write_buffer_index <= request_buffer_index;
write_buffer_wstrb <= request_buffer_wstrb;
write_buffer_wdata <= request_buffer_wdata;
write_buffer_offset <= request_buffer_offset;
write_buffer_way <= way_hit;
end
else begin
write_buffer_state <= write_buffer_idle;
end
end
endcase
end
/*====================================main state idle=======================================*/
assign req_or_inst_valid = valid || dcacop_op_en || preld_en;
//state change condition, write hit cache block write do not conflict with lookup read and cacop
assign main_idle2lookup = !(write_state_is_full && ((write_buffer_offset[3:2] == offset[3:2]) || dcacop_op_en));
assign dcache_empty = main_state_is_idle;
//addr_ok logic
/*===================================main state lookup======================================*/
//tag compare
generate for(i=0;i<2;i=i+1) begin:gen_way_hit
assign way_hit[i] = way_tagv_douta[i][0] && (tag == way_tagv_douta[i][20:1]); //this signal will not maintain
end endgenerate
assign cache_hit = |way_hit && !(uncache_en || cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2); //uncache road reuse
//when cache inst op mode2 no hit, main state machine will still go a round. implement easy.
assign main_lookup2lookup = !(write_state_is_full && ((write_buffer_offset[3:2] == offset[3:2]) || dcacop_op_en)) &&
!(request_buffer_op && !op && ((request_buffer_offset[3:2] == offset[3:2]) || dcacop_op_en)) &&
cache_hit;
assign addr_ok = (main_state_is_idle && main_idle2lookup) || (main_state_is_lookup && main_lookup2lookup); //request can be get
//data select
generate for(i=0;i<2;i=i+1) begin:gen_way_data
assign way_data[i] = {way_bank_douta[i][3],way_bank_douta[i][2],way_bank_douta[i][1],way_bank_douta[i][0]};
assign way_load_word[i] = way_data[i][request_buffer_offset[3:2]*32 +: 32];
end endgenerate
assign load_res = {32{way_hit[0]}} & way_load_word[0] |
{32{way_hit[1]}} & way_load_word[1] ;
assign request_uncache_en = (uncache_en && !request_buffer_dcacop);
assign uncache_wr = request_uncache_en && request_buffer_op && !cacop_op_mode1 && !cacop_op_mode2_hit_wr;
//data_ok logic
decoder_2_4 dec_rand_way (.in({1'b0,random_val[0]}),.out(chosen_way));
one_valid_n #(2) sel_one_invalid (.in(~{way_tagv_douta[1][0],way_tagv_douta[0][0]}),.out(invalid_way),.nozero(has_invalid_way));
assign rand_repl_way = has_invalid_way ? invalid_way : chosen_way[1:0]; //chose invalid way first.
decoder_2_4 dec_cacop_way (.in({1'b0,request_buffer_offset[0]}),.out(cacop_chose_way));
assign replace_way = {2{cacop_op_mode0 || cacop_op_mode1}} & cacop_chose_way[1:0] |
{2{cacop_op_mode2}} & way_hit |
{2{!request_buffer_dcacop}} & rand_repl_way;
assign way_d = way_d_reg[request_buffer_index] |
{2{(write_buffer_index==request_buffer_index)&&write_state_is_full}}&write_buffer_way;
assign replace_d = |(replace_way & way_d);
assign replace_v = |(replace_way & {way_tagv_douta[1][0],way_tagv_douta[0][0]});
/*====================================main state miss=======================================*/
assign replace_tag = {20{miss_buffer_replace_way[0]}} & way_tagv_douta[0][20:1] |
{20{miss_buffer_replace_way[1]}} & way_tagv_douta[1][20:1] ;
assign replace_data = {128{miss_buffer_replace_way[0]}} & way_data[0] |
{128{miss_buffer_replace_way[1]}} & way_data[1] ;
assign wr_type = uncache_wr_buffer ? uncache_wr_type : 3'b100; //replace cache line
assign wr_addr = uncache_wr_buffer ? {request_buffer_tag, request_buffer_index, request_buffer_offset} :
{replace_tag, request_buffer_index, 4'b0};
assign wr_data = uncache_wr_buffer ? {96'b0, request_buffer_wdata} : replace_data;
assign wr_wstrb = uncache_wr_buffer ? request_buffer_wstrb : 4'hf;
//assign wr_req = main_state_is_miss;
/*==================================main state replace======================================*/
assign uncache_wr_type = request_buffer_size;
assign rd_req = main_state_is_replace && !(uncache_wr_buffer || cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2);
assign rd_type = request_buffer_uncache_en ? request_buffer_size : 3'b100;
assign rd_addr = request_buffer_uncache_en ? {request_buffer_tag, request_buffer_index, request_buffer_offset} : {request_buffer_tag, request_buffer_index, 4'b0};
/*===================================main state refill======================================*/
//write process will not block pipeline
//preld ins will not block pipeline ps:preld is not real mem inst, this operation is controled in pipeline
assign data_ok = ((main_state_is_lookup && (cache_hit || request_buffer_op || cancel_req)) ||
(main_state_is_refill && (!request_buffer_op && (ret_valid && ((miss_buffer_ret_num == request_buffer_offset[3:2]) || request_buffer_uncache_en))))) &&
!(request_buffer_preld || request_buffer_dcacop); //when rd_req is not set, set data_ok directly.
//rdate connect with ret_data dirctly. maintain one clock only
assign write_in = {(request_buffer_wstrb[3] ? request_buffer_wdata[31:24] : ret_data[31:24]),
(request_buffer_wstrb[2] ? request_buffer_wdata[23:16] : ret_data[23:16]),
(request_buffer_wstrb[1] ? request_buffer_wdata[15: 8] : ret_data[15: 8]),
(request_buffer_wstrb[0] ? request_buffer_wdata[ 7: 0] : ret_data[ 7: 0])};
assign refill_data = (request_buffer_op && (request_buffer_offset[3:2] == miss_buffer_ret_num)) ? write_in : ret_data;
assign way_wr_en = miss_buffer_replace_way & {2{ret_valid}}; //when rd_req is not set, ret_valid and ret_last will not be set. block will not be wr also.
assign cache_miss = main_state_is_refill && ret_last && !(request_buffer_uncache_en || request_buffer_dcacop || request_buffer_preld);
//add one
assign ret_num_add_one[0] = miss_buffer_ret_num[0] ^ 1'b1;
assign ret_num_add_one[1] = miss_buffer_ret_num[1] ^ miss_buffer_ret_num[0];
always @(posedge clk) begin
if (reset) begin
rd_req_buffer <= 1'b0;
end
else if (rd_req) begin
rd_req_buffer <= 1'b1;
end
else if (main_state_is_refill && (ret_valid && ret_last)) begin
rd_req_buffer <= 1'b0;
end
end
/*==========================================================================================*/
//refill or write state update dirty reg
always @(posedge clk) begin
if (main_state_is_refill && ((ret_valid && ret_last) || !rd_req_buffer) && (!(request_buffer_uncache_en || cacop_op_mode0))) begin
way_d_reg[request_buffer_index][0] <= miss_buffer_replace_way[0] ? request_buffer_op : way_d_reg[request_buffer_index][0];
way_d_reg[request_buffer_index][1] <= miss_buffer_replace_way[1] ? request_buffer_op : way_d_reg[request_buffer_index][1];
end
else if (write_state_is_full) begin
way_d_reg[write_buffer_index] <= way_d_reg[write_buffer_index] | write_buffer_way;
end
end
//cache ins control signal
assign cacop_op_mode0 = request_buffer_dcacop && (request_buffer_cacop_op_mode == 2'b00);
assign cacop_op_mode1 = request_buffer_dcacop && ((request_buffer_cacop_op_mode == 2'b01) || (request_buffer_cacop_op_mode == 2'b11));
assign cacop_op_mode2 = request_buffer_dcacop && (request_buffer_cacop_op_mode == 2'b10);
assign cacop_op_mode2_hit_wr = cacop_op_mode2 && |way_hit;
//output
assign rdata = {32{main_state_is_lookup}} & load_res |
{32{main_state_is_refill}} & ret_data ;
generate
for(i=0;i<2;i=i+1) begin:gen_data_way
for(j=0;j<4;j=j+1) begin:gen_data_bank
/*===============================bank addra logic==============================*/
assign wr_match_way_bank[i][j] = write_state_is_full && (write_buffer_way[i] && (write_buffer_offset[3:2] == j[1:0]));
assign way_bank_addra[i][j] = wr_match_way_bank[i][j] ? write_buffer_index : ({8{addr_ok}} & index | /*lookup*/
{8{!addr_ok}} & request_buffer_index);
/*===============================bank we logic=================================*/
assign way_bank_wea[i][j] = {4{wr_match_way_bank[i][j]}} & write_buffer_wstrb |
{4{main_state_is_refill && (way_wr_en[i] && (miss_buffer_ret_num == j[1:0]))}} & 4'hf;
/*===============================bank dina logic=================================*/
assign way_bank_dina[i][j] = {32{write_state_is_full}} & write_buffer_wdata |
{32{main_state_is_refill}} & refill_data ;
/*===============================bank ena logic=================================*/
assign way_bank_ena[i][j] = (!(request_buffer_uncache_en || cacop_op_mode0)) || main_state_is_idle || main_state_is_lookup;
end
end
endgenerate
generate
for(i=0;i<2;i=i+1) begin:gen_tagv_way
/*===============================tagv addra logic=================================*/
assign way_tagv_addra[i] = {8{addr_ok }} & index |
{8{!addr_ok}} & request_buffer_index ;
/*===============================tagv ena logic=================================*/
assign way_tagv_ena[i] = (!request_buffer_uncache_en) || main_state_is_idle || main_state_is_lookup;
/*===============================tagv wea logic=================================*/
assign way_tagv_wea[i] = miss_buffer_replace_way[i] && main_state_is_refill &&
((ret_valid && ret_last) || cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2_hit_wr_buffer); //write at least 4B
/*===============================tagv dina logic=================================*/
assign way_tagv_dina[i] = (cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2_hit_wr_buffer) ? 21'b0 : {request_buffer_tag, 1'b1};
end
endgenerate
/*==============================================================================*/
generate
for(i=0;i<2;i=i+1) begin:data_ram_way
for(j=0;j<4;j=j+1) begin:data_ram_bank
data_bank_sram u(
.addra (way_bank_addra[i][j]),
.clka (clk ),
.dina (way_bank_dina[i][j] ),
.douta (way_bank_douta[i][j]),
.ena (way_bank_ena[i][j] ),
.wea (way_bank_wea[i][j] )
);
end
end
endgenerate
generate
for(i=0;i<2;i=i+1) begin:tagv_ram_way
//[20:1] tag [0:0] v
tagv_sram u(
.addra (way_tagv_addra[i]),
.clka (clk ),
.dina (way_tagv_dina[i] ),
.douta (way_tagv_douta[i]),
.ena (way_tagv_ena[i] ),
.wea (way_tagv_wea[i] )
);
end
endgenerate
lfsr lfsr(
.clk (clk ),
.reset (reset ),
.random_val (random_val )
);
assign main_state_is_idle = main_state == main_idle ;
assign main_state_is_lookup = main_state == main_lookup ;
assign main_state_is_miss = main_state == main_miss ;
assign main_state_is_replace = main_state == main_replace;
assign main_state_is_refill = main_state == main_refill ;
assign write_state_is_idle = (write_buffer_state == write_buffer_idle) ;
assign write_state_is_full = (write_buffer_state == write_buffer_write);
endmodule
`ifdef SIMU
module data_bank_sram
#(
parameter WIDTH = 32 ,
parameter DEPTH = 256
)
(
input [ 7:0] addra ,
input clka ,
input [31:0] dina ,
output [31:0] douta ,
input ena ,
input [ 3:0] wea
);
reg [31:0] mem_reg [255:0];
reg [31:0] output_buffer;
always @(posedge clka) begin
if (ena) begin
if (wea) begin
if (wea[0]) begin
mem_reg[addra][ 7: 0] <= dina[ 7: 0];
end
if (wea[1]) begin
mem_reg[addra][15: 8] <= dina[15: 8];
end
if (wea[2]) begin
mem_reg[addra][23:16] <= dina[23:16];
end
if (wea[3]) begin
mem_reg[addra][31:24] <= dina[31:24];
end
end
else begin
output_buffer <= mem_reg[addra];
end
end
end
assign douta = output_buffer;
endmodule
module tagv_sram
#(
parameter WIDTH = 21 ,
parameter DEPTH = 256
)
(
input [ 7:0] addra ,
input clka ,
input [20:0] dina ,
output [20:0] douta ,
input ena ,
input wea
);
reg [20:0] mem_reg [255:0];
reg [20:0] output_buffer;
always @(posedge clka) begin
if (ena) begin
if (wea) begin
mem_reg[addra] <= dina;
end
else begin
output_buffer <= mem_reg[addra];
end
end
end
assign douta = output_buffer;
endmodule
`endif
module lfsr
(
input clk ,
input reset ,
output [1:0] random_val
);
reg [7:0] r_lfsr;
always @(posedge clk) begin
if (reset) begin
r_lfsr <= 8'b1;
end
else begin
r_lfsr[0] <= r_lfsr[7];
r_lfsr[1] <= r_lfsr[0];
r_lfsr[2] <= r_lfsr[1];
r_lfsr[3] <= r_lfsr[2];
r_lfsr[4] <= r_lfsr[3] ^ r_lfsr[7];
r_lfsr[5] <= r_lfsr[4] ^ r_lfsr[7];
r_lfsr[6] <= r_lfsr[5] ^ r_lfsr[7];
r_lfsr[7] <= r_lfsr[6];
end
end
assign random_val = r_lfsr[7:6];
endmodule

99
rtl/ip/open-la500/div.v Normal file
View File

@@ -0,0 +1,99 @@
//x/y //执行需要34个周期
module div(
input div_clk, reset,
input div,
input div_signed,
input [31:0] x, y,
output [31:0] s, r,
output complete
);
reg [32:0] UnsignS;
reg [32:0] UnsignR;
reg [32:0] tmp_r;
reg [7:0] count;
wire [32:0] tmp_d;
wire [32:0] result_r;
wire [32:0] UnsignX, UnsignY;
reg div_signed_buffer;
reg x_31_buffer;
reg y_31_buffer;
wire real_div_signed;
wire real_x_31;
wire real_y_31;
wire complete_delay;
wire real_complete;
assign complete_delay = (count == 8'hf0);
assign real_complete = complete_delay || complete;
always @(posedge div_clk) begin
if (reset) begin
div_signed_buffer <= 1'b0;
x_31_buffer <= 1'b0;
y_31_buffer <= 1'b0;
end
else if (div) begin
div_signed_buffer <= div_signed; //when div inst go to ms, div_signed will be changed. so buffer it.
x_31_buffer <= x[31];
y_31_buffer <= y[31];
end
end
assign real_div_signed = real_complete ? div_signed_buffer : div_signed;
assign real_x_31 = real_complete ? x_31_buffer : x[31];
assign real_y_31 = real_complete ? y_31_buffer : y[31];
assign UnsignX = {1'b0, (real_div_signed ? (x[31] ? (~x + 32'b1) : x) : x)}; //取绝对值并扩展至33位
assign UnsignY = {1'b0, (real_div_signed ? (y[31] ? (~y + 32'b1) : y) : y)};
always @(posedge div_clk) begin //33位除法计算
if (reset) begin
count <= 8'd32;
tmp_r <= 33'b0;
UnsignS <= 33'b0;
UnsignR <= 33'b0;
end
else if (~div || complete_delay) begin
count <= 8'd32; //计算33次
tmp_r <= 33'b0;
end
else if (~(count[7])) begin
if (tmp_d[32]) begin //tmp_d为负数
UnsignS <= {UnsignS[31:0], 1'b0};
tmp_r <= result_r;
end
else begin
UnsignS <= {UnsignS[31:0], 1'b1};
tmp_r <= tmp_d;
end
count <= count - 8'd1;
end
else begin
UnsignR <= tmp_r;
count <= 8'hf0; //complete signal only maintain one clock
end
end
assign complete = (count == 8'hff);//chenji
assign result_r = {tmp_r[31:0], UnsignX[count]};
assign tmp_d = result_r - UnsignY;
wire [32:0] TmpS, TmpR;
assign TmpS = (real_div_signed ? ((real_x_31 == real_y_31) ? UnsignS : ~(UnsignS - 1)) : UnsignS); //去绝对值并截位
assign TmpR = (real_div_signed ? (real_x_31 ? ~(UnsignR - 1) : UnsignR) : UnsignR);
assign s = TmpS[31:0];
assign r = TmpR[31:0];
endmodule
//表达式的符号关系
//x[31] y[31] s[31] r[31]
// 0 0 0 0
// 0 1 1 0
// 1 0 1 1
// 1 1 0 1

View File

@@ -0,0 +1,135 @@
# LaCC 接口
LaCC(Loongarch32R Custom Coprocessor Interface) 接口是 Open-LA500 用于扩展自定义指令的接口。
# 指令格式
![](./picture/lacc_inst.png)
| 域 | 描述 |
| ------ | --------------------------------------------------- |
| opcode | 固定为1100,用于确定当前指令 |
| command| 用于编码多个自定义指令将发送至lacc接口 |
| imm | 额外的立即数 |
| rj | 第一个寄存器的编码 |
| rk | 第二个寄存器的编码 |
| rd | 目的寄存器编码当目的寄存器不为0时会写回寄存器堆中 |
# 接口定义
> 方向为协处理器视角
| 通道 | 方向 | 宽度 | 信号名 | 描述 |
| -------- | ------ | ------------- | --------------- | ------------------------------------------------------------ |
| 全局 | input | 1 | lacc_flush | 当处理器触发异常或分支预测失败时将该信号置位 |
| 请求 | input | 1 | lacc_req_valid | 处理器核发送请求 |
| 请求 | input | LACC_OP_WIDTH | lacc_req_command| 指令中的command域 |
| 请求 | input | 7 | lacc_req_imm | 指令中的imm域 |
| 请求 | input | 32 | lacc_req_rj | 第一个寄存器的值 |
| 请求 | input | 32 | lacc_req_rk | 第二个寄存器的值 |
| 回复 | output | 1 | lacc_rsp_valid | 指令完成信号,处理器将继续执行 |
| 回复 | output | 32 | lacc_rsp_rdat | 写回寄存器的数据 |
| 访存请求 | output | 1 | lacc_data_valid | 向dcache发送的访存请求信号 |
| 访存请求 | input | 1 | lacc_data_ready | dcache当前是否可以接受请求 |
| 访存请求 | output | 32 | lacc_data_addr | 访存地址 |
| 访存请求 | output | 1 | lacc_data_read | 是否为读请求 |
| 访存请求 | output | 32 | lacc_data_wdata | 写入数据 |
| 访存请求 | output | 2 | lacc_data_size | 访存数据大小 <br>2'b00: byte<br>2'b01: half<br>2'b10: word |
| 访存回复 | input | 1 | lacc_drsp_valid | dcache发送的回复信号<br>写请求将会在第二周期接收到该回复<br>读请求将会在dcache成功后接收 |
| 访存回复 | input | 32 | lacc_drsp_data | 访存得到的数据 |
# 运行流程
1. 在解码阶段解析 lacc 指令,并将 op 和 imm 发送给执行阶段
2. 在执行阶段`lacc_req_valid`为1lacc接口将接受 lacc 指令并暂停,直到`lacc_rsp_valid`为高才会将指令发送给下一级
3. 如果需要访存,可以将 `lacc_data_valid` 置高并设置访存地址及大小等信息。当`lacc_data_valid``lacc_data_ready`同时为高则当前请求成功发送至dcache。
4. 当指令执行完成之后将`lacc_rsp_valid`置高,指令将从 exe 级继续执行
lacc读请求时序,读取地址为0x1C0FFF38读取数据为0x00000000
![](./picture/lacc_read.png)
lacc写请求时序写入地址为0x1C0FFEE8,写入数据为0x70A3A52B
![](./picture/lacc_write.png)
# 自定义指令
添加自定义指令只需两步:
1. 修改`mycpu.h`中的`LACC_OP_SIZE`为自定义指令数量(若`LACC_OP_SIZE`为1需要修改`LACC_OP_WIDTH`为1并且取消`HAS_LACC`的注释
2.`lacc_core.v`中删除示例`lacc_demo`,写入自定指令代码
# demo
demo实现了从两个地址载入向量点乘之后再存入缓存中的功能代码位于`lacc_demo.v`中。
## 指令
| 指令 | op | rj | rk | 描述 |
| -------- | ---- | ----- | ----- | ----------------------------------------------- |
| op_cfg | 1 | size | waddr | 设置写回地址及向量长度 |
| op_lmadd | 0 | addr1 | addr2 | 设置需要计算的两个内存地址对位相加再存入waddr |
## 状态机
| 状态 | 说明 | 转换条件 | 下一状态 |
| --------- | --------------- | ----------------------------------------- | --------- |
| IDLE | | lacc_req_valid & op_lmadd | REQ_ADDR1 |
| REQ_ADDR1 | 访问addr1的数据 | data_hsk(访问addr1数据) | REQ_ADDR2 |
| REQ_ADDR2 | 访问addr2的数据 | data_hsk(访问addr2数据) | FINAL |
| FINAL | 计算结果并写回 | data_hsk & req_size_nz(写入数据且size!=0) | REQ_ADDR1 |
| FINAL | | data_hsk & ~req_size_nz | IDLE |
data_hsk即`lacc_data_valid & lacc_data_ready`,表明当前访存请求发送成功。
**数据控制**
使用buffer_valid信号表示第一个地址的数据是否被接收。wdata_valid表示写回数据准备完成
`buffer_valid & lacc_drsp_valid`为高时说明第二个地址的数据已经返回,在该周期计算`buffer_data+lacc_drsp_rdata`并写入wdata
# 修改编译器
我们可以使用".word xxxxxxx"的格式在汇编中添加自定义指令。但是这种方式阅读不够友好,并且不利于操作数读写,例如
```c
asm volatile (
"move $r5, %[addr]\n\t"
"move $r6, %[para]\n\t"
".word 0xc00018a0\n\t"
::[addr]"r"(addr),[para]"r"(para)
:"$r5", "$r6"
);
```
**修改编译器**
我们可以修改binutils使得编译器可以识别自定义指令。
- 下载loongarch toolchain: https://gitee.com/loongson-edu/la32r-toolchains/tree/master
- 进入src/la32r_binutils/opcodes,打开loongarch-opc.c,在`loongarch_fix_opcodes`结构体中添加
```c
{0xc0000000, 0xf0000000, "lacc", "u22:6,r0:5,r5:5,r10:5,u15:7", 0, 0, 0, 0}
```
- 根据toolchina的README编译并将bin目录添加到path中
自定义指令的格式为:
```
lacc command, rd, rj, rk, imm
```
上例可以修改为:
```c
asm volatile (
"lacc 0x0, $r0, %[addr], %[para], 0x0\n\t"
::[addr]"r"(addr), [para]"r"(para)
);
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 73 KiB

View File

@@ -0,0 +1,94 @@
# 分支预测
分支预测即根据分支历史提前预测指令跳转方向及跳转目标。分支预测器种类繁多例如BTBBranch Target Buffer、BHTBranch History Table、RASReturn Address Stack但适用于五级流水的其实并不多。如设计概述中所介绍的五级流水中pfs级发出取指请求fs级取回指令ds级开始译码并明确跳转指令的所有信息。也就表示分支预测只有一拍的空间可以利用且无预译码可依据的信息只有pc跳转方向及跳转目标都只能靠猜。适用的实际上仅有BTB。不过对于较为特殊的jirl指令即寄存器跳转若直接采用BTB的机制会使得命中率极低。因此在BTB的基础上进行改造由CAM表记录jirl指令的pc并由RAS机制预测其跳转目标。
----
接下来介绍分支预测器的具体实现,分为两点:分支预测器的内部实现;分支预测器同流水线的交互。
## 分支预测器设计
分支预测器的逻辑可以分为两个部分由PC进行预测跳转目标及跳转方向的预测部分根据指令译码结果与分支预测器预测结果的比较对分支预测器历史信息进行修正的修正部分。
### 预测部分
首先来看模块的接口。
- fetch_pc取指pc
- fetch_en取指使能
- ret_en预测结果使能
- ret_pc预测跳转目标
- taken预测跳转方向
- ret_index指示当前预测结果的对应项用于修正
btb表有32项包含四个内容
- btb_pc30位用于和取指pc匹配
- btb_target30位存放跳转目标
- btb_counter2位由最高位指示跳转方向
- btb_valid1位当前项是否有效
对于jirl指令的预测逻辑由两部分组成16项的CAM表以及8项的RAS。
CAM表包含两个内容
- ras_pc30位用于和取指pc匹配
- ras_valid1位当前项是否有效
RAS简单来说就是一个栈。存储30位的跳转目标由ras_ptr指示栈顶。
明确表项内容后来看预测逻辑。fetch_en以及fetch_pc进入分支预测器时首先缓存一拍。单看分支预测器缓存插在什么位置其实都可以只要有一拍的延迟即可但考虑到赋值nextpc即fetch_pc的路径非常长因此将缓存插在靠前的位置。缓存后的信号为fetch_en_r及fetch_pc_r将其与两个CAM的PC部分btb_pc和ras_pc进行匹配。该项有效且pc匹配则表示命中。命中项记录在btb_match_rd和ras_match_rd中。
两个表中有一项命中则置起ret_en。若命中在btb CAM表中ret_pc取对应项的btb_target即可若命中在ras CAM表中则取ras栈顶的跳转目标。理论上btb_match_rd和ras_match_rd为one-hot且只会命中在一个表中因此不必使用优先级逻辑。ret_index同理而taken在btb_match时需要看对应项btb_counter最高位而ras_match时直接置为1。预测阶段不会对分支预测器的历史信息进行任何修正。
### 修正部分
在ds译码级汇总分支预测信息以及译码得到的正确分支跳转信息。
预测信息同上:
- ds_btb_target
- ds_btb_index
- ds_btb_taken
- ds_btb_en
正确的分支跳转信息如下:
- br_to_btb表示当前指令是否为分支预测器所覆盖的跳转指令当前结构的分支预测器覆盖所有跳转指令。
- br_taken跳转方向
- br_target跳转目标
译码级根据以上信息,稍加分析并传递至分支预测器,传递信息包括:
- operate_en分支预测器修正操作使能。注意该信号需要在ds_ready_go、es_allowin和ds_valid为1时置起且无例外。只有ds当拍允许流动至下一级时才能发出修正操作。对于ds_ready_go因为流水线中指令寄存器的依赖有可能导致ds阻塞阻塞时指令所得到的寄存器值是错误的即译码得到的分支跳转信息也是错误的。而es_allowin的介入使其操作使能仅维持一拍避免重复操作。
- operate_pc修正操作针对的分支指令的pc
- operate_index修正操作针对的CAM表项
- pop_ras为jirl指令时置起
- push_ras为bl指令时置起
- add_entry当前指令为分支指令但预测器未进行预测时置起。这里与上了一个额外条件br_taken因为当pc在分支预测器中未命中时顺序指令也算是预测为untaken。br_taken不为1时并不需要在预测器中建项
- delete_entry当前指令不为分支指令但预测器进行预测时置起。这种情况出现的概率极低只有在自修改代码中可以不用考虑
- pre_error当前指令为分支指令且预测器进行预测但跳转目标不一致时置起
- pre_right当前指令为分支指令且预测器进行预测且跳转目标一致时置起
- target_error当前指令为分支指令且预测器进行预测且跳转目标一致但跳转目标不一致时置起
- right_orien正确的跳转方向
- right_target正确的跳转目标
分支预测器会根据以上信息进行修正首先由pop_ras即是否为jirl指令区分当前操作的是BTB部分还是RAS部分针对BTB部分操作类型包括
- 建项add_entry置起时。使能指定项并填入operate_pc和right_target以及初始化btb_counter为2'b10。对于填入项即index的选择若BTB CAM表中还留有未使能项则选择其中一项。若所有项都有效则选择btb_counter为2'b00的一项即大概率不跳转的预测项因为未在btb中命中的pc即预测为不跳转两者效果是相同的只不过该pc若下一次为跳转则会重新建项相当于将其btb_counter由2'b00直接转为2'b10很有可能导致下一次的预测出错。但是这种情况出现的概率肯定比替换掉其他btb_counter的值导致下一次预测错的概率低。若所有项都有效且btb_counter都不为2'b00则随机选择一项。
- 修正跳转目标target_error置起时。重新填入right_target并初始化btb_counter为2'b10。
- 修正跳转方向pre_error或者pre_right置起时。根据right_orien调整btb_counter若正确的分支方向为taken则累加若为untaken则递减。
针对RAS部分操作类型包括
- 建项add_entry置起时。使能指定项填入operate_pc即可。相当于预测pc的分支类型。index的选择同BTB部分。
- 进栈当push_ras置起且当前RAS非满时存入当前pc的下一条指令并将指针上移。即当前为bl函数跳转指令其返回地址为bl的下一条指令。RAS指针指向的是空项。
- 出栈当pop_ras置起且当前RAS非空时指针直接下移一项即可。
## 分支预测器与流水线交互逻辑
首先简单介绍取指逻辑该部分内容会在后续逐步展开。来看几个关键信号inst_valid指示当前取指pc是否有效该信号直接送往icache等待接收。inst_valid主要看fs_allowin信号即pfs指令允许向后流动时其取指pc即为有效能够避免相同pc取指请求的重复发出。nextpc即为取指pc当前仅关注其最为常规的pc来源seq_pc顺序取指pc。当inst_valid为1时并不代表取指请求即刻便能发出还需要等待icache空闲由inst_addr_ok表示取指请求的其他去向tlb、btb不需要等待。当inst_valid && inst_addr_ok为1取指请求发出pfs_ready_go置起pfs向fs流动。
fetch_en和fetch_pc发送至分支预测器后下一拍得到预测结果btb_en/btb_ret_pc/btb_taken。当btb_en && btb_taken为1时fetch_btb_target置起分支跳转将nextpc纠正至btb_ret_pc。这段逻辑中有一点需要注意分支预测器返回的结果仅维持一拍但pfs并不一定能够向后流动因此需要缓存分支预测器的返回结果避免信息丢失。分支预测器的返回结果缓存在btb_lock_buffer中由btb_lock_en指示缓存是否有效。缓存条件即为分支返回结果且pfs无法向后流动btb_en && !pfs_ready_go缓存释放条件为下一次取指重新发起。而nextpc的维护以及fs向ds传递分支预测结果时都将采用btb_ret_pc_t、btb_ret_pc_t、btb_ret_pc_t、btb_ret_pc_t信号由分支预测器返回信息和缓存的信息相或得到两者同一时刻仅有一个有效。
译码级根据译码和分支预测的结果能够知道下一条指令取指的方向是否正确。取指级有两个方面的工作需要完成修正分支预测的历史信息若预测错误则需纠正取指方向。此处主要讲述第二方面的内容。若译码级检测到分支预测错误则会置起btb_pre_error_flush信号由该信号修正取指方向该工作可继续细分为两个内容
- 取消掉下一条进入译码级的指令,及错误预测导致错误取指的指令
两种情况需要考虑:
- 当该分支指令向后流动的那一拍btb_pre_error_flush && es_allowin为1时若下一拍下一条指令紧接着进入译码级则可直接取消。不过可以用更粗犷的方法即下一拍的ds_valid直接为0无需识别是否有指令进入。
- 同样的条件若下一拍无指令进入即当拍fs_to_ds_valid为0则置起branch_slot_cancel触发器。该触发器为1时会等待下一条指令进入当检测fs_to_ds_valid为1时将其取消并恢复branch_slot_cancel至0。
- 纠正nextpc。当译码级发现分支预测错误会将正确的跳转目标btb_pre_error_flush_target和使能信号btb_pre_error_flush通过br_bus传递至取指阶段修正nextpc。
需要考虑两个问题:
- 当nextpc修正信息向前传递时pfs并不一定处于能够取指的状态例如inst_addr_ok不为1。此时若译码级不阻塞且pfs不将修正信息存入buffer会导致信息丢失。
- 译码级会记录分支预测错误并取消掉错误预测的分支指令的下一条指令。若pfs在取完预测错误的分支指令后始终处于阻塞状态之后直接被修正至正确的跳转方向则会导致正确的指令被取消掉。
针对上述两个问题可在pfs中构建一个状态机解决。br_target_inst_req_state表示状态。当发现预测错误后根据流水线状态进入不同的状态。
- !fs_valid && !inst_addr_ok预测错误分支指令取出后未进行任何取指因为fs级为空inst_addr_ok从未置起。且当前也不处于可取指的状态。进入br_target_inst_req_wait_slot状态。
- !inst_addr_ok && fs_valid已有一条错误取指的指令在fs中但pfs并不处于能够取指的状态。进入br_target_inst_req_wait_br_target状态。
- inst_addr_ok && !fs_valid预测错误分支指令取出后未进行任何取指当前处于可取指的状态但当前的指令会在译码级取消。进入br_target_inst_req_wait_br_target状态。
br_target_inst_req_wait_slot状态会待取指请求发出后切换至br_target_inst_req_wait_br_target状态该状态便会将nextpc修正至存储btb_pre_error_flush_target的pc bufferbr_target_inst_req_buffer。当取指请求再次发出后状态切换至空状态br_target_inst_req_empty。
构建状态机后还需要注意一个问题。也许pfs处于即刻能够发出取指请求的状态。可粗略的判断fs级是否有指令便改变next_pc同时置起inst_valid即使当前无法取指(inst_addr_ok为0)也没有关系。

View File

@@ -0,0 +1,17 @@
# 前言
OpenLA500齐物是一款实现了`龙芯架构32位精简版指令集LoongArch32 Reduced`的处理器核。OpenLA500的架构为经典的单发射静态五级流水配备两路组相连的指令/数据Cache32项全相连的tlb实现虚实地址映射以及BTB及RAS实现分支预测。基本按照国科大体系结构实验课程逐步设计得到。FPGA运行频率为50M。
OpenLA500经过全面验证已处于稳定状态。并于2023年龙芯全流程平台项目中在SMIC 180um工艺下流片。
以下将从8个方面介绍OpenLA500的设计细节。
- [设计概述](./设计概述.md):基于顶层框图,介绍各级流水的划分和各级流水之间的交互,以及流水线与其他模块的交互。
- [分支预测](./分支预测.md)BTB及RAS两个分支预测器的具体设计以及在流水线中的工作原理。
- [流水线的发射部分]():流水线发射阶段(译码级)操作数准备、前递网络、阻塞控制的设计。
- [流水线的执行部分]()alu、mul、div运算部件在流水线中的实现。
- [MMU]()虚实地址映射逻辑tlb维护。
- [例外实现]():流水线中例外的判断及处理。
- [访存子系统]():指令/数据Cache及AXI总线接口的实现。
- [调试部分]()UART在线调试系统与OpenLA500的交互逻辑。

View File

@@ -0,0 +1,63 @@
# 设计概述
![系统框图](./picture/框图.svg)
系统框图展示了OpenLA500处理器核的内部结构。指令的执行始终按照取指、译码、执行、访存、写回的顺序进行。各级流水会存储指令执行的中间状态在完成当前阶段的操作后将结果存入下一级流水进行下一个阶段的操作。
## 执行流程
处理器核内维护`nextpc`信号,表示取指地址。待取指开始时,`fetch_pc`会发往`icache``tlb``btb`进行不同的工作,以及存入触发器。在下一拍即`fs_stage`取指级,`btb`完成分支预测,将结果直接送往`nextpc`,更改其取指方向。以及由`fetch_pc`以及`tlb`虚实地址翻译的结果,在`icache`指令缓存中得到`inst`指令码,并存入触发器。下一级`ds_stage`译码级在拿到指令码后,便交给`decoder`进行译码。译码后可以得到具体的操作码`op`,以及寄存器号,并可借此在`regfile`寄存器堆或者`csr`控制寄存器中索引到寄存器值`rf_data``es_stage`执行级在得到操作码及操作数后,便可送至`alu`运算部件进行简单运算,以及送往`div`除法部件、`mul`乘法部件进行多拍较复杂的运算。此外,由`alu`可以得到访存地址,并发至`tlb``dcache`数据缓存完成访存请求的处理,基本同`icache`。在`mem_stage`访存级,可得到访存、乘除法的运算结果,并由`op`选择最终的运算结果`final_res`,并存入触发器。`wb_stage`写回级便会将最终运算结果写回到`regfile``csr`。对于`i/dcache`,若取指/访存请求缺失,便会发送请求至`axi_bridge`并通过标准的`AXI3`总线协议与外界通信。
---
流水级的各阶段需要与其他不同的模块进行交互,确认其任务是否完成。以下针对流水级及各模块进行介绍:
## 取指阶段
### `pfs`
该级流水的主要功能是维护`nextpc`,也就是流水线取指的虚地址。其实从更为严谨的角度来看`pfs`并不应该单独称为一级流水,因为`nextpc``wire`类型,或者说`pfs`在正常的执行流程中不会缓存信号,当拍便可获得取指方向。实际上`pfs`为各级能够影响取指地址的流水级的衍生。`nextpc`的维护逻辑可以说是五级流水中最为复杂的一部分,因为`icache`并不是时刻处于就绪的状态,由`inst_addr_ok`信号指示其是否就绪。因此,对于某些特殊情况下出现的转顺即逝的信息,比如流水线刷新等,需要构建状态机缓存信息。待取指方向确认,且`icache`就绪,`pfs`会置起`fetch_en`,紧接着`btb``icache``tlb`便会接收`pfs`发出的`fetch_pc`,开始相应的操作。
### `btb`
分支预测器,根据`PC`直接预测其跳转地址。32项`btb`组织为`CAM`表的形式,由`PC`查询。此外,还包括一个`ras`组织为一个8项的栈存储函数调用过程中的返回地址。`btb`分支预测的结果会在下一拍返回,若预测正确则可消除下一拍产生于`fs`级流水的空泡,预测错误不会造成任何性能损失。
### `icache`
指令缓存,两路组相连的结构。`fetch_pc`进入`icache`后,首先根据由`fetch_pc`拆解而来的`index`部分,索引出两路的`cache`行。其次再根据`tag`部分与`tlb`翻译得到的物理页号是否相等判断是否有命中项。若命中,下一拍返回`inst_rdata`,若未命中,则进入`icache`处理`miss`请求的状态机,`inst_addr_ok`拉低。等待`miss`请求处理完毕后再接收新的取指请求。替换策略为随机替换。
### `tlb`
组织为32项`CAM`表的形式,指令和数据共用,需要两个查询端口。通过`fetch_pc`的虚页号查询是否存在命中项,命中则返回物理页号。若未命中,则会标记该指令存在`tlb`缺失例外,并进入例外处理程序。待对应`tlb`项在页表中找到并由软件填入`tlb`后,重新执行该指令。
### `fs`
`fs`取值级便是真正意义上的第一级流水。`fs`级接收`icache`返回的指令码`inst_rdata`,由`inst_data_ok`指示指令码是否就绪。若`icache miss``fs`级便会阻塞。额外需要注意一点,`tlb`的读取有一拍的延迟因为32项的CAM表会有较长的逻辑。因此`tlb`的读取结果是在`fs`级得到,也就表示`tlb`相关的例外判断是在`fs`级完成,而取指请求是在`pfs`级发出。所以`fs`级需要具备中止上一拍`pfs`级发往`icache`的取指请求的能力,体现在`tlb_excp_cancel_req`信号中,`dcache`同理。
## 译码阶段
### `ds`
译码级,其功能便是将指令码解析为对应的操作码,并根据寄存器号,从`regfile`中取出寄存器值。不过寄存器的最新值可能还未被写入到`regfile`中,因为后续流水线中的指令正在执行,结果未获得或者未写回,而当前指令可能依赖于这些指令的结果。因此译码级与后续流水线构建了前递路径(`es_to_ds_forward_bus/ms_to_ds_forward_bus`),用于前递未写回的结果,或者通知`ds`级结果未获取,需要阻塞。此外,经过译码,已知晓该指令是否为跳转指令,以及是否跳转。也就可以识别分支预测器的预测结果是否正确,`ds`级便可通过`br_bus`对取指方向进行修正,以及更新分支预测器的历史信息。`btb`的预测包括两个方面,该`PC`是否为跳转指令;识别为跳转指令时是否跳转。`ds`级需要鉴别并对分支预测器执行特定的更新操作。
## 执行阶段
### `es`
执行级,根据译码得到的操作码,送往`alu``mul``div`进行特定类型的运算。`alu`用于处理较简单的运算,比如加减、移位运算。`mul`为华莱士树实现的乘法部件,拆为两级。`div`为常规的迭代除法部件需要34个周期得到结果。此外若为访存指令执行级便可通过加减运算得到访存地址。同`pfs`等待`dcache`准备就绪,即`data_addr_ok`置起,`es`便可发出`data_fetch`以及`data_addr`,由`tlb`进行虚实地址映射以及`dcache`取回数据。不过需要注意若当前指令或者后续流水线中的指令存在例外,需要及时中止`data_fetch`的置起。
## 访存阶段
### `dcache`
数据缓存,在`icache`的基础上新增了处理写请求的能力。每个`cache`行新增一个`dirty`位,用于标识该`cache`行是否为脏,在被替换时判断是否需要写回。此外,写请求在命中的情况下,相较于读请求,需要额外的一拍将数据写入到`ram`中。这额外一拍的延迟可以通过构建写请求状态机节省。待确认写请求命中后,进入状态机,使得`dcache`可以继续接收新的请求。不过需要注意规避读写请求同时出现在单端口的`ram`上。
### `ms`
访存级,等待`dcache`数据返回,由`data_data_ok`指示`data_rdata`是否有效。`ms`级同`fs`级,需要由`tlb_excp_cancel_req`信号取消掉上一拍`es`发往`dcache`的访存请求。写请求不同于读请求需要等待接收数据,待请求发出后,不管命不命中,可继续向后流动。
## 写回阶段
### `ws`
写回级,此时指令已执行完毕,并得到`result`,需根据执行结果修改处理器状态,包括两个方面:写通用寄存器`regfile`以及状态寄存器`csr`;根据指令触发的例外类型,修改`csr`寄存器,以及清空流水线并调整取指方向至例外入口。
---
## 流水级间交互
最理想的情况下同一时刻五个阶段都处于工作的状态也就表示同一时刻在处理五条指令。此时处理器核在通过流水线切分得到较高频率时带宽仍然保持为1指令/拍。但在实际情况下,如前文所述,并不会如此,各级流水总是会出现阻塞的情况。因此,流水级之间需要交互,避免一级流水的操作还未完成便被上一级覆盖。
流水级之间的交互逻辑中,需要维护以下关键信号:
- `stage_valid`:由触发器维护,表示当前流水级是否在处理指令。
- `stage_ready_go`:表示流水级是否需要被阻塞。
- `stage_allowin`:表示当前流水级是否允许上一级流水进入,该信号传递至上一级,与上一级流水交互,两种情况下该信号置起,当前流水无正在处理的指令;或者当前流水级中的指令已处理完毕且下一级流水允许进入。该信号会由最后一级流水向前传递,只要某一流水级阻塞,其`stage_allowin`拉低,同一时刻,前面所有流水级的`stage_allowin`都会拉低(假设所有流水级中都有指令在处理)。
- `stage_to_nextstage_valid`:表示当前流水级中是否有处理完毕的指令,该信号传递至下一级,与下一级流水交互。下一级流水在收到高电平信号时,且其下一级流水的`stage_allowin`为高电平,则会在下一拍置起下一级流水的`stage_valid`,实现流水级间指令的流动。`stage_to_nextstage_bus`总线信号中传递缓存信号,会随着`stage_to_nextstage_valid`流动。

View File

@@ -0,0 +1,439 @@
`include "mycpu.h"
`include "csr.h"
module exe_stage(
input clk ,
input reset ,
//allowin
input ms_allowin ,
output es_allowin ,
//from ds
input ds_to_es_valid ,
input [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus ,
//to ms
output es_to_ms_valid ,
output [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus ,
//to ds forward path
output [`ES_TO_DS_FORWARD_BUS -1:0] es_to_ds_forward_bus,
output es_to_ds_valid ,
//div_mul
output es_div_enable ,
output es_mul_div_sign ,
output [31:0] es_rj_value ,
output [31:0] es_rkd_value ,
input div_complete ,
`ifdef HAS_LACC
output es_lacc_req,
output [`LACC_OP_WIDTH-1:0] es_lacc_command,
input lacc_req_ready,
input lacc_data_valid,
input lacc_data_read,
input [31: 0] lacc_data_addr,
input [31: 0] lacc_data_wdata,
input [1: 0] lacc_data_size,
input lacc_rsp_valid,
input [31: 0] lacc_rsp_rdat,
output [6: 0] lacc_req_imm,
output lacc_flush,
input data_data_ok,
output lacc_drsp_valid,
`endif
//exception
input excp_flush ,
input ertn_flush ,
input refetch_flush ,
input icacop_flush ,
//idle
input idle_flush ,
//tlb/cache ins
output tlb_inst_stall ,
//cache ins
output icacop_op_en ,
output dcacop_op_en ,
output [ 1:0] cacop_op_mode ,
//from icache
input icache_unbusy ,
//preld ins
output [ 4:0] preld_hint ,
output preld_en ,
// data cache interface
output data_valid ,
output data_op ,
output [ 2:0] data_size ,
output [ 3:0] data_wstrb ,
output [31:0] data_wdata ,
input data_addr_ok ,
//from csr
input [18:0] csr_vppn ,
//to addr trans
output [31:0] data_addr ,
output data_fetch ,
//from ms
input ms_wr_tlbehi ,
input ms_flush
);
reg es_valid ;
wire es_ready_go ;
wire [31:0] error_va ;
reg [`DS_TO_ES_BUS_WD -1:0] ds_to_es_bus_r;
wire [13:0] es_alu_op ;
wire es_src1_is_pc ;
wire es_src2_is_imm ;
wire es_src2_is_4 ;
wire es_gr_we ;
wire es_store_op ;
wire [ 4:0] es_dest ;
wire [31:0] es_imm ;
wire [31:0] es_pc ;
wire [ 3:0] es_mul_div_op ;
wire [ 1:0] es_mem_size ;
wire [31:0] es_csr_data ;
wire [13:0] es_csr_idx ;
wire [31:0] es_csr_result ;
wire [31:0] csr_mask_result;
wire es_res_from_csr;
wire es_csr_we ;
wire es_csr_mask ;
wire es_excp ;
wire excp ;
wire [ 8:0] es_excp_num ;
wire [ 9:0] excp_num ;
wire es_ertn ;
wire es_mul_enable ;
wire div_stall ;
wire es_ll_w ;
wire es_sc_w ;
wire es_tlbsrch ;
wire es_tlbwr ;
wire es_tlbfill ;
wire es_tlbrd ;
wire es_refetch ;
wire es_invtlb ;
wire [ 9:0] es_invtlb_asid ;
wire [18:0] es_invtlb_vpn ;
wire es_cacop ;
wire es_preld ;
wire es_br_inst ;
wire es_icache_miss ;
wire es_idle ;
wire es_load_op ;
wire dep_need_stall ;
wire forward_enable ;
wire dest_zero ;
wire excp_ale ;
wire es_flush_sign ;
wire [ 3:0] wr_byte_en ;
wire access_mem ;
wire es_mem_sign_exted;
wire [ 1:0] sram_addr_low2bit;
wire tlbsrch_stall ;
wire [31:0] pv_addr ;
wire [ 4:0] cacop_op ;
wire dcache_req_or_inst_en;
wire icacop_inst ;
wire icacop_inst_stall;
wire dcacop_inst ;
wire preld_inst ;
wire es_br_pre_error ;
wire es_br_pre ;
// difftest
wire [31:0] es_inst ;
wire [63:0] es_timer_64 ;
wire es_cnt_inst ;
wire [ 7:0] es_inst_ld_en ;
wire [ 7:0] es_inst_st_en ;
wire es_csr_rstat_en ;
wire [2: 0] data_size_pre;
`ifdef HAS_LACC
wire lacc_stall;
wire [3: 0] lacc_data_mask, lacc_data_byte, lacc_data_half;
wire es_lacc_req_pre;
`endif
assign {
`ifdef HAS_LACC
es_lacc_command,
es_lacc_req_pre,
`endif
es_csr_rstat_en , //349:349 for difftest
es_inst_st_en , //348:341 for difftest
es_inst_ld_en , //340:333 for difftst
es_cnt_inst , //332:332 for difftest
es_timer_64 , //331:268 for difftest
es_inst , //236:267 for difftest
es_idle , //235:235
es_br_pre_error , //234:234
es_br_pre , //233:233
es_icache_miss , //232:232
es_br_inst , //231:231
es_preld , //230:230
es_cacop , //229:229
es_mem_sign_exted, //228:228
es_invtlb , //227:227
es_tlbrd , //226:226
es_refetch , //225:225
es_tlbfill , //224:224
es_tlbwr , //223:223
es_tlbsrch , //222:222
es_sc_w , //221:221
es_ll_w , //220:220
es_excp_num , //219:211
es_csr_mask , //210:210
es_csr_we , //209:209
es_csr_idx , //208:195
es_res_from_csr , //194:194
es_csr_data , //193:162
es_ertn , //161:161
es_excp , //160:160
es_mem_size , //159:158
es_mul_div_op , //157:154
es_mul_div_sign , //153:153
es_alu_op , //152:139
es_load_op , //138:138
es_src1_is_pc , //137:137
es_src2_is_imm , //136:136
es_src2_is_4 , //135:135
es_gr_we , //134:134
es_store_op , //133:133
es_dest , //132:128
es_imm , //127:96
es_rj_value , //95 :64
es_rkd_value , //63 :32
es_pc //31 :0
} = ds_to_es_bus_r;
wire [31:0] es_alu_src1 ;
wire [31:0] es_alu_src2 ;
wire [31:0] es_alu_result ;
wire [31:0] exe_result ;
assign es_to_ms_bus = {es_csr_data , //424:393 for difftest
es_csr_rstat_en , //392:392 for difftest
data_wdata , //391:360 for difftest
es_inst_st_en , //359:352 for difftest
data_addr , //351:320 for difftest
es_inst_ld_en , //319:312 for difftest
es_cnt_inst , //311:311 for difftest
es_timer_64 , //310:247 for difftest
es_inst , //246:215 for difftest
error_va , //214:183
es_idle , //182:182
es_cacop , //181:181
preld_inst , //180:180
es_br_pre_error , //179:179
es_br_pre , //178:178
es_icache_miss , //177:177
es_br_inst , //176:176
icacop_op_en , //175:175
es_mem_sign_exted, //174:174 //only add this, not used.
es_invtlb_vpn , //173:155
es_invtlb_asid , //154:145
es_invtlb , //144:144
es_tlbrd , //143:143
es_refetch , //142:142
es_tlbfill , //141:141
es_tlbwr , //140:140
es_tlbsrch , //139:139
es_store_op , //138:138
es_sc_w , //137:137
es_ll_w , //136:136
excp_num , //135:126
es_csr_we , //125:125
es_csr_idx , //124:111
es_csr_result , //110:79
es_ertn , //78:78
excp , //77:77
es_mem_size , //76:75
es_mul_div_op , //74:71
es_load_op , //70:70
es_gr_we , //69:69
es_dest , //68:64
exe_result , //63:32
es_pc //31:0
};
assign es_to_ds_valid = es_valid;
assign access_mem = es_load_op || es_store_op;
assign es_flush_sign = excp_flush || ertn_flush || refetch_flush || icacop_flush || idle_flush;
assign icacop_inst_stall = icacop_op_en && !icache_unbusy;
assign es_ready_go = (!div_stall &&
`ifdef HAS_LACC
!lacc_stall &&
`endif
((dcache_req_or_inst_en && data_addr_ok) || !(access_mem || dcacop_inst || preld_inst)) && !tlbsrch_stall && !icacop_inst_stall) || excp;
assign es_allowin = !es_valid || es_ready_go && ms_allowin;
assign es_to_ms_valid = es_valid && es_ready_go;
always @(posedge clk) begin
if (reset || es_flush_sign) begin
es_valid <= 1'b0;
end
else if (es_allowin) begin
es_valid <= ds_to_es_valid;
end
if (ds_to_es_valid && es_allowin) begin
ds_to_es_bus_r <= ds_to_es_bus;
end
end
assign es_alu_src1 = es_src1_is_pc ? es_pc : es_rj_value;
assign es_alu_src2 = (es_src2_is_imm) ? es_imm :
(es_src2_is_4) ? 32'd4 : es_rkd_value;
assign es_div_enable = (es_mul_div_op[2] | es_mul_div_op[3]) & es_valid;
assign es_mul_enable = es_mul_div_op[0] | es_mul_div_op[1];
assign div_stall = es_div_enable & ~div_complete;
`ifdef HAS_LACC
assign lacc_stall = es_lacc_req & ~lacc_rsp_valid;
assign es_lacc_req = es_lacc_req_pre & es_valid;
assign lacc_req_imm = es_imm[11: 5];
assign lacc_drsp_valid = es_lacc_req & data_data_ok;
`endif
alu u_alu(
.alu_op (es_alu_op ),
.alu_src1 (es_alu_src1 ), //bug3 es_alu_src2
.alu_src2 (es_alu_src2 ),
.alu_result (es_alu_result)
);
assign exe_result = `ifdef HAS_LACC
es_lacc_req ? lacc_rsp_rdat :
`endif
es_res_from_csr ? es_csr_data : es_alu_result;
//forward path
assign dest_zero = (es_dest == 5'b0);
assign forward_enable = es_gr_we & ~dest_zero & es_valid;
assign dep_need_stall = es_load_op | es_div_enable | es_mul_enable;
assign es_to_ds_forward_bus = {dep_need_stall , //38:38
forward_enable , //37:37
es_dest , //36:32
exe_result //31:0
};
assign tlb_inst_stall = (es_tlbsrch || es_tlbrd) && es_valid;
//csr mask
assign csr_mask_result = (es_rj_value & es_rkd_value) | (~es_rj_value & es_csr_data);
assign es_csr_result = es_csr_mask ? csr_mask_result : es_rkd_value;
assign error_va = pv_addr;
//exception
assign excp_ale = access_mem & ((es_mem_size[0] & 1'b0) |
(es_mem_size[1] & es_alu_result[0]) |
(!es_mem_size & (es_alu_result[0] | es_alu_result[1]))) ;
assign excp = es_excp || excp_ale;
assign excp_num = {excp_ale, es_excp_num};
assign sram_addr_low2bit = {es_alu_result[1], es_alu_result[0]};
//mem_size[0] byte size [1] halfword size
assign dcache_req_or_inst_en = es_valid && !excp && ms_allowin && !es_flush_sign && !ms_flush;
`ifdef HAS_LACC
assign data_valid = access_mem && dcache_req_or_inst_en || lacc_data_valid;
assign data_op = lacc_data_valid ? !lacc_data_read :
es_store_op && !es_cacop && !es_preld;
decoder_2_4 decoder_lacc_size (lacc_data_addr[1: 0], lacc_data_byte);
assign lacc_data_half[0] = ~(|lacc_data_addr);
assign lacc_data_half[1] = ~lacc_data_addr[1];
assign lacc_data_half[2] = ^lacc_data_addr;
assign lacc_data_half[3] = lacc_data_addr[1];
assign lacc_data_mask = lacc_data_size == 2'b00 ? lacc_data_byte :
lacc_data_size == 2'b01 ? lacc_data_half : 4'b1111;
assign data_wstrb = lacc_data_valid ? lacc_data_mask : wr_byte_en;
assign data_size = lacc_data_valid ? lacc_data_size : data_size_pre;
assign data_addr = lacc_data_valid ? lacc_data_addr :
es_tlbsrch ? {csr_vppn, 13'b0} : pv_addr;
`else
assign data_valid = access_mem && dcache_req_or_inst_en;
assign data_op = es_store_op && !es_cacop && !es_preld;
assign data_wstrb = wr_byte_en;
assign data_size = data_size_pre;
assign data_addr = es_tlbsrch ? {csr_vppn, 13'b0} : pv_addr;
`endif
wire [3:0] es_stb_wen = { sram_addr_low2bit==2'b11 ,
sram_addr_low2bit==2'b10 ,
sram_addr_low2bit==2'b01 ,
sram_addr_low2bit==2'b00} ;
wire [3:0] es_sth_wen = { sram_addr_low2bit==2'b10 ,
sram_addr_low2bit==2'b10 ,
sram_addr_low2bit==2'b00 ,
sram_addr_low2bit==2'b00} ;
wire [31:0] es_stb_cont = { {8{es_stb_wen[3]}} & es_rkd_value[7:0] ,
{8{es_stb_wen[2]}} & es_rkd_value[7:0] ,
{8{es_stb_wen[1]}} & es_rkd_value[7:0] ,
{8{es_stb_wen[0]}} & es_rkd_value[7:0]};
wire [31:0] es_sth_cont = { {16{es_sth_wen[3]}} & es_rkd_value[15:0] ,
{16{es_sth_wen[0]}} & es_rkd_value[15:0]};
assign {wr_byte_en, data_size_pre} = ({7{es_mem_size[0]}} & {es_stb_wen, 3'b00}) |
({7{es_mem_size[1]}} & {es_sth_wen, 3'b01}) |
({7{!es_mem_size }} & {4'b1111 , 3'b10}) ;
//assign data_wdata = es_rkd_value;
assign data_wdata =
`ifdef HAS_LACC
lacc_data_valid ? lacc_data_wdata :
`endif
({32{es_mem_size[0]}} & es_stb_cont ) |
({32{es_mem_size[1]}} & es_sth_cont ) |
({32{!es_mem_size }} & es_rkd_value) ;
assign tlbsrch_stall = es_tlbsrch && ms_wr_tlbehi;
//invtlb
assign es_invtlb_asid = es_rj_value[9:0];
assign es_invtlb_vpn = es_rkd_value[31:13];
assign pv_addr = es_alu_result;
//cache ins
assign cacop_op = es_dest;
assign icacop_inst = es_cacop && (cacop_op[2:0] == 3'b0);
assign icacop_op_en = icacop_inst && dcache_req_or_inst_en;
assign dcacop_inst = es_cacop && (cacop_op[2:0] == 3'b1);
assign dcacop_op_en = dcacop_inst && dcache_req_or_inst_en;
assign cacop_op_mode = cacop_op[4:3];
//preld ins
assign preld_hint = es_dest;
assign preld_inst = es_preld && ((preld_hint == 5'd0) || (preld_hint == 5'd8))/* && !data_uncache_en*/; //preld must have bug
assign preld_en = preld_inst && dcache_req_or_inst_en;
assign data_fetch = (data_valid || dcacop_inst || preld_en) && data_addr_ok || ((icacop_inst || es_tlbsrch) && es_ready_go && ms_allowin)
`ifdef HAS_LACC
|| lacc_data_valid
`endif
;
endmodule

410
rtl/ip/open-la500/icache.v Normal file
View File

@@ -0,0 +1,410 @@
module icache
(
input clk ,
input reset ,
//to from cpu
input valid ,
input op ,
input [ 7:0] index ,
input [19:0] tag ,
input [ 3:0] offset ,
input [ 3:0] wstrb ,
input [31:0] wdata ,
output addr_ok ,
output data_ok ,
output [31:0] rdata ,
input uncache_en ,
input icacop_op_en ,
input [ 1:0] cacop_op_mode ,
input [ 7:0] cacop_op_addr_index , //this signal from mem stage's va
input [19:0] cacop_op_addr_tag ,
input [ 3:0] cacop_op_addr_offset,
output icache_unbusy,
input tlb_excp_cancel_req,
//to from axi
output rd_req ,
output [ 2:0] rd_type ,
output [31:0] rd_addr ,
input rd_rdy ,
input ret_valid ,
input ret_last ,
input [31:0] ret_data ,
output reg wr_req ,
output [ 2:0] wr_type ,
output [31:0] wr_addr ,
output [ 3:0] wr_wstrb ,
output [127:0] wr_data ,
input wr_rdy ,
//to perf_counter
output cache_miss
);
reg request_buffer_op ;
reg [ 7:0] request_buffer_index ;
reg [19:0] request_buffer_tag ;
reg [ 3:0] request_buffer_offset ;
reg [ 3:0] request_buffer_wstrb ;
reg [31:0] request_buffer_wdata ;
reg request_buffer_uncache_en ;
reg request_buffer_icacop ;
reg [ 1:0] request_buffer_cacop_op_mode;
reg [ 1:0] miss_buffer_replace_way ;
reg [ 1:0] miss_buffer_ret_num ;
wire [ 1:0] ret_num_add_one ;
wire [ 7:0] way_bank_addra [1:0][3:0];
wire [31:0] way_bank_dina [1:0][3:0];
wire [31:0] way_bank_douta [1:0][3:0];
wire way_bank_ena [1:0][3:0];
wire [ 3:0] way_bank_wea [1:0][3:0];
wire [ 7:0] way_tagv_addra [1:0];
wire [20:0] way_tagv_dina [1:0];
wire [20:0] way_tagv_douta [1:0];
wire way_tagv_ena [1:0];
wire way_tagv_wea [1:0];
wire [ 1:0] way_hit ;
wire cache_hit ;
wire [ 31:0] way_load_word [1:0];
wire [127:0] way_data [1:0];
wire [31:0] load_res ;
wire main_idle2lookup ;
wire main_lookup2lookup;
wire main_state_is_idle ;
wire main_state_is_lookup ;
wire main_state_is_replace;
wire main_state_is_refill ;
wire [1:0] way_wr_en;
wire [31:0] refill_data;
wire cacop_op_mode0;
wire cacop_op_mode1;
wire cacop_op_mode2;
wire [1:0] random_val;
wire [3:0] chosen_way;
wire [1:0] replace_way;
wire [1:0] invalid_way;
wire has_invalid_way;
wire [1:0] rand_repl_way;
wire [3:0] cacop_chose_way;
wire cacop_op_mode2_hit_wr;
wire cacop_op_mode2_no_hit;
reg [ 1:0] lookup_way_hit_buffer;
wire [ 3:0] real_offset;
wire [19:0] real_tag ;
wire [ 7:0] real_index ;
wire req_or_inst_valid ;
localparam main_idle = 5'b00001;
localparam main_lookup = 5'b00010;
localparam main_replace = 5'b01000;
localparam main_refill = 5'b10000;
localparam write_buffer_idle = 1'b0;
localparam write_buffer_write = 1'b1;
reg [4:0] main_state;
reg rd_req_buffer;
genvar i,j;
//state machine
//main loop
always @(posedge clk) begin
if (reset) begin
main_state <= main_idle;
request_buffer_op <= 1'b0;
request_buffer_index <= 8'b0;
request_buffer_tag <= 20'b0;
request_buffer_offset <= 4'b0;
request_buffer_wstrb <= 4'b0;
request_buffer_wdata <= 32'b0;
request_buffer_uncache_en <= 1'b0;
request_buffer_cacop_op_mode <= 2'b0;
request_buffer_icacop <= 1'b0;
miss_buffer_replace_way <= 2'b0;
wr_req <= 1'b0;
end
else case (main_state)
main_idle: begin
if (req_or_inst_valid && main_idle2lookup) begin
main_state <= main_lookup;
request_buffer_op <= op ;
request_buffer_index <= real_index ;
request_buffer_offset <= real_offset;
request_buffer_wstrb <= wstrb;
request_buffer_wdata <= wdata;
request_buffer_cacop_op_mode <= cacop_op_mode;
request_buffer_icacop <= icacop_op_en ;
end
end
main_lookup: begin
if (req_or_inst_valid && main_lookup2lookup) begin
main_state <= main_lookup;
request_buffer_op <= op ;
request_buffer_index <= real_index ;
request_buffer_offset <= real_offset;
request_buffer_wstrb <= wstrb;
request_buffer_wdata <= wdata;
request_buffer_cacop_op_mode <= cacop_op_mode;
request_buffer_icacop <= icacop_op_en ;
end
else if (tlb_excp_cancel_req) begin
main_state <= main_idle;
end
else if (!cache_hit) begin
main_state <= main_replace;
request_buffer_tag <= real_tag;
request_buffer_uncache_en <= (uncache_en && !request_buffer_icacop);
miss_buffer_replace_way <= replace_way;
end
else begin
main_state <= main_idle;
end
end
main_replace: begin
if (rd_rdy) begin
main_state <= main_refill;
miss_buffer_ret_num <= 2'b0; //when get ret data, it will be sent to cpu directly.
end
end
main_refill: begin
if ((ret_valid && ret_last) || !rd_req_buffer) begin //when rd_req is not set, go to next state directly
main_state <= main_idle;
end
else begin
if (ret_valid) begin
miss_buffer_ret_num <= ret_num_add_one;
end
end
end
default: begin
main_state <= main_idle;
end
endcase
end
assign real_offset = icacop_op_en ? cacop_op_addr_offset : offset;
assign real_index = icacop_op_en ? cacop_op_addr_index : index ;
assign real_tag = request_buffer_icacop ? cacop_op_addr_tag : tag ;
/*====================================main state idle=======================================*/
assign req_or_inst_valid = valid || icacop_op_en;
assign main_idle2lookup = 1'b1;
assign icache_unbusy = main_state_is_idle;
//addr_ok logic
/*===================================main state lookup======================================*/
//tag compare
generate for(i=0;i<2;i=i+1) begin:gen_way_hit
assign way_hit[i] = way_tagv_douta[i][0] && (real_tag == way_tagv_douta[i][20:1]); //this signal will not maintain
end endgenerate
assign cache_hit = |way_hit && !(uncache_en || cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2); //uncache road reuse
//when cache inst op mode2 no hit, main state machine will still go a round. implement easy.
assign main_lookup2lookup = cache_hit;
assign addr_ok = ((main_state_is_idle && main_idle2lookup) || (main_state_is_lookup && main_lookup2lookup)) && !icacop_op_en; //request can be get
//data select
generate for(i=0;i<2;i=i+1) begin: gen_data
assign way_data[i] = {way_bank_douta[i][3],way_bank_douta[i][2],way_bank_douta[i][1],way_bank_douta[i][0]};
assign way_load_word[i] = way_data[i][request_buffer_offset[3:2]*32 +: 32];
end endgenerate
assign load_res = {32{way_hit[0]}} & way_load_word[0] |
{32{way_hit[1]}} & way_load_word[1] ;
//data_ok logic
/*====================================main state miss=======================================*/
decoder_2_4 dec_rand_way (.in({1'b0,random_val[0]}),.out(chosen_way));
one_valid_n #(2) sel_one_invalid (.in(~{way_tagv_douta[1][0],way_tagv_douta[0][0]}),.out(invalid_way),.nozero(has_invalid_way));
assign rand_repl_way = has_invalid_way ? invalid_way : chosen_way[1:0]; //chose invalid way first.
decoder_2_4 dec_cacop_way (.in({1'b0,request_buffer_offset[0]}),.out(cacop_chose_way));
assign replace_way = {2{cacop_op_mode0 || cacop_op_mode1}} & cacop_chose_way[1:0] |
{2{cacop_op_mode2}} & way_hit |
{2{!request_buffer_icacop}} & rand_repl_way;
/*==================================main state replace======================================*/
assign rd_req = main_state_is_replace && !(cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2);
/*===================================main state refill======================================*/
assign rd_type = request_buffer_uncache_en ? 3'b10 : 3'b100;
assign rd_addr = request_buffer_uncache_en ? {request_buffer_tag, request_buffer_index, request_buffer_offset} : {request_buffer_tag, request_buffer_index, 4'b0};
//write process will not block pipeline
assign data_ok = (main_state_is_lookup && (cache_hit || tlb_excp_cancel_req)) ||
(main_state_is_refill && ((ret_valid && ((miss_buffer_ret_num == request_buffer_offset[3:2]) || request_buffer_uncache_en))/* || !rd_req_buffer*/)) &&
!request_buffer_icacop; //when rd_req is not set, set data_ok directly.
//rdate connect with ret_data dirctly. maintain one clock only
assign refill_data = ret_data;
assign way_wr_en = miss_buffer_replace_way & {2{ret_valid}}; //when rd_req is not set, ret_valid and ret_last will not be set. block will not be wr also.
assign cache_miss = main_state_is_refill && ret_last && !(request_buffer_uncache_en || request_buffer_icacop);
//add one
assign ret_num_add_one[0] = miss_buffer_ret_num[0] ^ 1'b1;
assign ret_num_add_one[1] = miss_buffer_ret_num[1] ^ miss_buffer_ret_num[0];
always @(posedge clk) begin
if (reset) begin
rd_req_buffer <= 1'b0;
end
else if (rd_req) begin
rd_req_buffer <= 1'b1;
end
else if (main_state_is_refill && (ret_valid && ret_last)) begin
rd_req_buffer <= 1'b0;
end
end
/*==========================================================================================*/
//cache ins control signal
assign cacop_op_mode0 = request_buffer_icacop && (request_buffer_cacop_op_mode == 2'b00);
assign cacop_op_mode1 = request_buffer_icacop && ((request_buffer_cacop_op_mode == 2'b01) || (request_buffer_cacop_op_mode == 2'b11));
assign cacop_op_mode2 = request_buffer_icacop && (request_buffer_cacop_op_mode == 2'b10);
assign cacop_op_mode2_hit_wr = cacop_op_mode2 && |lookup_way_hit_buffer;
assign cacop_op_mode2_no_hit = cacop_op_mode2 && ~|lookup_way_hit_buffer;
always @(posedge clk) begin
if (reset) begin
lookup_way_hit_buffer <= 2'b0;
end
else if (cacop_op_mode2 && main_state_is_lookup) begin
lookup_way_hit_buffer <= way_hit;
end
end
//output
assign rdata = {32{main_state_is_lookup}} & load_res |
{32{main_state_is_refill}} & ret_data ;
generate
for(i=0;i<2;i=i+1) begin:gen_data_way
for(j=0;j<4;j=j+1) begin:gen_data_bank
/*===============================bank addra logic==============================*/
assign way_bank_addra[i][j] = {8{addr_ok}} & real_index | /*lookup*/
{8{!addr_ok}} & request_buffer_index ;
/*===============================bank we logic=================================*/
assign way_bank_wea[i][j] = {4{main_state_is_refill &&
(way_wr_en[i] && (miss_buffer_ret_num == j[1:0]))}} & 4'hf;
/*===============================bank dina logic=================================*/
assign way_bank_dina[i][j] = {32{main_state_is_refill}} & refill_data;
/*===============================bank ena logic=================================*/
assign way_bank_ena[i][j] = (!(request_buffer_uncache_en || cacop_op_mode0)) || main_state_is_idle || main_state_is_lookup;
end
end
endgenerate
generate
for(i=0;i<2;i=i+1) begin:gen_tagv_way
/*===============================tagv addra logic=================================*/
assign way_tagv_addra[i] = {8{addr_ok || (icacop_op_en &&
(main_state_is_idle || main_state_is_lookup))}} & real_index |
{8{main_state_is_replace || main_state_is_refill}} & request_buffer_index ;
//{8{(main_state_is_miss && wr_rdy) ||
/*===============================tagv ena logic=================================*/
assign way_tagv_ena[i] = (!request_buffer_uncache_en) || main_state_is_idle || main_state_is_lookup;
/*===============================tagv wea logic=================================*/
assign way_tagv_wea[i] = miss_buffer_replace_way[i] && main_state_is_refill &&
((ret_valid && ret_last) || cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2_hit_wr); //wirte at last 4B
/*===============================tagv dina logic=================================*/
assign way_tagv_dina[i] = (cacop_op_mode0 || cacop_op_mode1 || cacop_op_mode2_hit_wr) ? 21'b0 : {request_buffer_tag, 1'b1};
end
endgenerate
/*==============================================================================*/
generate
for(i=0;i<2;i=i+1) begin:data_ram_way
for(j=0;j<4;j=j+1) begin:data_ram_bank
data_bank_sram u(
.addra (way_bank_addra[i][j]) ,
.clka (clk ) ,
.dina (way_bank_dina[i][j] ) ,
.douta (way_bank_douta[i][j]) ,
.ena (way_bank_ena[i][j] ) ,
.wea (way_bank_wea[i][j] )
);
end
end
endgenerate
generate
for(i=0;i<2;i=i+1) begin:tagv_ram_way
//[20:1] tag [0:0] v
tagv_sram u(
.addra (way_tagv_addra[i]) ,
.clka (clk ) ,
.dina (way_tagv_dina[i] ) ,
.douta (way_tagv_douta[i]) ,
.ena (way_tagv_ena[i] ) ,
.wea (way_tagv_wea[i] )
);
end
endgenerate
lfsr lfsr(
.clk (clk ) ,
.reset (reset ) ,
.random_val (random_val )
);
assign main_state_is_idle = main_state == main_idle ;
assign main_state_is_lookup = main_state == main_lookup ;
assign main_state_is_replace = main_state == main_replace;
assign main_state_is_refill = main_state == main_refill ;
endmodule

1005
rtl/ip/open-la500/id_stage.v Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,357 @@
`include "mycpu.h"
`include "csr.h"
module if_stage(
input clk ,
input reset ,
//allwoin
input ds_allowin ,
//brbus
input [`BR_BUS_WD -1:0] br_bus ,
//to ds
output fs_to_ds_valid ,
output [`FS_TO_DS_BUS_WD -1:0] fs_to_ds_bus ,
//exception
input excp_flush ,
input ertn_flush ,
input refetch_flush ,
input icacop_flush ,
input [31:0] ws_pc ,
input [31:0] csr_eentry ,
input [31:0] csr_era ,
input excp_tlbrefill ,
input [31:0] csr_tlbrentry ,
input has_int ,
//idle
input idle_flush ,
// inst cache interface
output inst_valid ,
output inst_op ,
output [ 3:0] inst_wstrb ,
output [31:0] inst_wdata ,
input inst_addr_ok ,
input inst_data_ok ,
input icache_miss ,
input [31:0] inst_rdata ,
output inst_uncache_en ,
output tlb_excp_cancel_req,
//from csr
input csr_pg ,
input csr_da ,
input [31:0] csr_dmw0 ,
input [31:0] csr_dmw1 ,
input [ 1:0] csr_plv ,
input [ 1:0] csr_datf ,
input disable_cache ,
//to btb
output [31:0] fetch_pc ,
output fetch_en ,
input [31:0] btb_ret_pc ,
input btb_taken ,
input btb_en ,
input [ 4:0] btb_index ,
//to addr trans
output [31:0] inst_addr ,
output inst_addr_trans_en,
output dmw0_en ,
output dmw1_en ,
//tlb
input inst_tlb_found ,
input inst_tlb_v ,
input inst_tlb_d ,
input [ 1:0] inst_tlb_mat ,
input [ 1:0] inst_tlb_plv
);
reg fs_valid;
wire fs_ready_go;
wire fs_allowin;
wire to_fs_valid;
wire pfs_ready_go;
wire [31:0] seq_pc;
wire [31:0] nextpc;
wire pfs_excp_adef;
wire fs_excp_tlbr;
wire fs_excp_pif;
wire fs_excp_ppi;
reg fs_excp;
reg fs_excp_num;
wire excp;
wire [3:0] excp_num;
wire pfs_excp;
wire pfs_excp_num;
wire flush_sign;
reg [31:0] inst_rd_buff;
reg inst_buff_enable;
wire da_mode;
wire pg_mode;
wire btb_pre_error_flush;
wire [31:0] btb_pre_error_flush_target;
wire flush_inst_delay;
wire flush_inst_go_dirt;
wire fetch_btb_target;
reg idle_lock;
wire tlb_excp_lock_pc;
wire [31:0] btb_ret_pc_t;
wire [ 4:0] btb_index_t;
wire btb_taken_t;
wire btb_en_t;
wire [31:0] excp_entry;
wire [31:0] inst_flush_pc;
assign {btb_pre_error_flush,
btb_pre_error_flush_target } = br_bus;
wire [31:0] fs_inst;
reg [31:0] fs_pc;
reg [37:0] btb_lock_buffer;
reg btb_lock_en;
assign fs_to_ds_bus = {btb_ret_pc_t, //108:77
btb_index_t, //76:72
btb_taken_t, //71:71
btb_en_t, //70:70
icache_miss, //69:69
excp, //68:68
excp_num, //67:64
fs_inst, //63:32
fs_pc //31:0
};
assign flush_sign = ertn_flush || excp_flush || refetch_flush || icacop_flush || idle_flush;
assign flush_inst_delay = flush_sign && !inst_addr_ok || idle_flush;
assign flush_inst_go_dirt = flush_sign && inst_addr_ok && !idle_flush;
//flush state machine
reg [31:0] flush_inst_req_buffer;
reg flush_inst_req_state;
localparam flush_inst_req_empty = 1'b0;
localparam flush_inst_req_full = 1'b1;
always @(posedge clk) begin
if (reset) begin
flush_inst_req_state <= flush_inst_req_empty;
end
else case (flush_inst_req_state)
flush_inst_req_empty: begin
if(flush_inst_delay) begin
flush_inst_req_buffer <= nextpc;
flush_inst_req_state <= flush_inst_req_full;
end
end
flush_inst_req_full: begin
if(pfs_ready_go) begin
flush_inst_req_state <= flush_inst_req_empty;
end
else if (flush_sign) begin
flush_inst_req_buffer <= nextpc;
end
end
endcase
end
assign fetch_btb_target = (btb_taken && btb_en) || (btb_lock_en && btb_lock_buffer[37]);
/*
* idle lock
* when idle inst commit, stop inst fetch until interrupted
*/
always @(posedge clk) begin
if (reset) begin
idle_lock <= 1'b0;
end
else if (idle_flush && !has_int) begin
idle_lock <= 1'b1;
end
else if (has_int) begin
idle_lock <= 1'b0;
end
end
/*
* br state machine
* when btb pre error, id stage will cancel one inst. so need confirm useless
* inst (will be canceled) is generated.
*/
reg [31:0] br_target_inst_req_buffer;
reg [ 2:0] br_target_inst_req_state;
localparam br_target_inst_req_empty = 3'b001;
localparam br_target_inst_req_wait_slot = 3'b010;
localparam br_target_inst_req_wait_br_target = 3'b100;
always @(posedge clk) begin
if (reset) begin
br_target_inst_req_state <= br_target_inst_req_empty;
end
else case (br_target_inst_req_state)
br_target_inst_req_empty: begin
if (flush_sign) begin
br_target_inst_req_state <= br_target_inst_req_empty;
end
else if(btb_pre_error_flush && !fs_valid && !inst_addr_ok) begin
br_target_inst_req_state <= br_target_inst_req_wait_slot;
br_target_inst_req_buffer <= btb_pre_error_flush_target;
end
else if(btb_pre_error_flush && !inst_addr_ok && fs_valid || btb_pre_error_flush && inst_addr_ok && !fs_valid) begin
br_target_inst_req_state <= br_target_inst_req_wait_br_target;
br_target_inst_req_buffer <= btb_pre_error_flush_target;
end
end
br_target_inst_req_wait_slot: begin
if(flush_sign) begin
br_target_inst_req_state <= br_target_inst_req_empty;
end
else if(pfs_ready_go) begin
br_target_inst_req_state <= br_target_inst_req_wait_br_target;
end
end
br_target_inst_req_wait_br_target: begin
if(pfs_ready_go || flush_sign) begin
br_target_inst_req_state <= br_target_inst_req_empty;
end
end
default: begin
br_target_inst_req_state <= br_target_inst_req_empty;
end
endcase
end
/*
* btb lock
* btb ret only maintain one clock
* when pfs not ready go, should buffer btb ret
*/
always @(posedge clk) begin
if (reset || flush_sign || fetch_en)
btb_lock_en <= 1'b0;
else if (btb_en && !pfs_ready_go) begin
btb_lock_en <= 1'b1;
btb_lock_buffer <= {btb_taken, btb_index, btb_ret_pc};
end
end
assign btb_ret_pc_t = {32{btb_lock_en}} & btb_lock_buffer[31:0] | btb_ret_pc;
assign btb_index_t = {5{btb_lock_en}} & btb_lock_buffer[36:32] | btb_index;
assign btb_taken_t = btb_lock_en && btb_lock_buffer[37] || btb_taken;
assign btb_en_t = btb_lock_en || btb_en;
// pre-IF stage
assign pfs_ready_go = (inst_valid || pfs_excp) && inst_addr_ok;
assign to_fs_valid = ~reset && pfs_ready_go;
assign seq_pc = fs_pc + 32'h4;
assign excp_entry = {32{excp_tlbrefill}} & csr_tlbrentry |
{32{!excp_tlbrefill}} & csr_eentry ;
assign inst_flush_pc = {32{ertn_flush}} & csr_era |
{32{refetch_flush || icacop_flush || idle_flush}} & (ws_pc + 32'h4) ;
assign nextpc = (flush_inst_req_state == flush_inst_req_full) ? flush_inst_req_buffer :
excp_flush ? excp_entry :
(ertn_flush || refetch_flush || icacop_flush || idle_flush) ? inst_flush_pc :
(br_target_inst_req_state == br_target_inst_req_wait_br_target) ? br_target_inst_req_buffer :
btb_pre_error_flush && fs_valid ? btb_pre_error_flush_target:
fetch_btb_target ? btb_ret_pc_t :
seq_pc ;
/*
*when encounter tlb excp, stop inst fetch until excp_flush. avoid fetch useless inst.
*but should not lock when btb state machine or flush state machine is work.
*/
assign tlb_excp_lock_pc = tlb_excp_cancel_req && br_target_inst_req_state != br_target_inst_req_wait_br_target && flush_inst_req_state != flush_inst_req_full;
//when flush_sign meet icache_busy 1, flush_sign's inst valid should not set immediately
assign inst_valid = (fs_allowin && !pfs_excp && !tlb_excp_lock_pc || flush_sign || btb_pre_error_flush) && !(idle_flush || idle_lock);
assign inst_op = 1'b0;
assign inst_wstrb = 4'h0;
assign inst_addr = nextpc; //nextpc
assign inst_wdata = 32'b0;
assign fs_inst = (inst_buff_enable) ? inst_rd_buff : inst_rdata;
//inst read buffer use for stall situation
always @(posedge clk) begin
if (reset || (fs_ready_go && ds_allowin) || flush_sign) begin
inst_buff_enable <= 1'b0;
end
else if ((inst_data_ok) && !ds_allowin) begin
inst_rd_buff <= inst_rdata;
inst_buff_enable <= 1'b1;
end
end
//exception
assign pfs_excp_adef = (nextpc[0] || nextpc[1]); //word align
//tlb
assign fs_excp_tlbr = !inst_tlb_found && inst_addr_trans_en;
assign fs_excp_pif = !inst_tlb_v && inst_addr_trans_en;
assign fs_excp_ppi = (csr_plv > inst_tlb_plv) && inst_addr_trans_en;
assign tlb_excp_cancel_req = fs_excp_tlbr || fs_excp_pif || fs_excp_ppi;
assign pfs_excp = pfs_excp_adef;
assign pfs_excp_num = {pfs_excp_adef};
assign excp = fs_excp || fs_excp_tlbr || fs_excp_pif || fs_excp_ppi ;
assign excp_num = {fs_excp_ppi, fs_excp_pif, fs_excp_tlbr, fs_excp_num};
//addr trans
assign inst_addr_trans_en = pg_mode && !dmw0_en && !dmw1_en;
//addr dmw trans //TOT
assign dmw0_en = ((csr_dmw0[`PLV0] && csr_plv == 2'd0) || (csr_dmw0[`PLV3] && csr_plv == 2'd3)) && (fs_pc[31:29] == csr_dmw0[`VSEG]) && pg_mode;
assign dmw1_en = ((csr_dmw1[`PLV0] && csr_plv == 2'd0) || (csr_dmw1[`PLV3] && csr_plv == 2'd3)) && (fs_pc[31:29] == csr_dmw1[`VSEG]) && pg_mode;
//uncache judgement
assign da_mode = csr_da && !csr_pg;
assign pg_mode = csr_pg && !csr_da;
assign inst_uncache_en = (da_mode && (csr_datf == 2'b0)) ||
(dmw0_en && (csr_dmw0[`DMW_MAT] == 2'b0)) ||
(dmw1_en && (csr_dmw1[`DMW_MAT] == 2'b0)) ||
(inst_addr_trans_en && (inst_tlb_mat == 2'b0)) ||
disable_cache;
//assign inst_uncache_en = 1'b1; //used for debug
// IF stage
assign fs_ready_go = inst_data_ok || inst_buff_enable || excp;
assign fs_allowin = !fs_valid || fs_ready_go && ds_allowin;
assign fs_to_ds_valid = fs_valid && fs_ready_go;
always @(posedge clk) begin
if (reset || flush_inst_delay) begin
fs_valid <= 1'b0;
end
else if (fs_allowin) begin
fs_valid <= to_fs_valid;
end
if (reset) begin
fs_pc <= 32'h1bfffffc; //trick: to make nextpc be 0x1c000000 during reset
fs_excp <= 1'b0;
fs_excp_num <= 4'b0;
end
else if (to_fs_valid && (fs_allowin || flush_inst_go_dirt)) begin
fs_pc <= nextpc;
fs_excp <= pfs_excp;
fs_excp_num <= pfs_excp_num;
end
end
//go btb and tlb
assign fetch_pc = nextpc;
assign fetch_en = inst_valid && inst_addr_ok;
endmodule

View File

@@ -0,0 +1,53 @@
`include "mycpu.h"
`include "csr.h"
`ifdef HAS_LACC
module lacc_core(
input clk,
input reset,
input lacc_flush,
input lacc_req_valid,
input [`LACC_OP_WIDTH-1: 0] lacc_req_command,
input [6: 0] lacc_req_imm,
input [31: 0] lacc_req_rj,
input [31: 0] lacc_req_rk,
output lacc_rsp_valid,
output [31: 0] lacc_rsp_rdat,
// wreq will also send valid sign
output lacc_data_valid,
input lacc_data_ready,
output [31: 0] lacc_data_addr,
output lacc_data_read,
output [31: 0] lacc_data_wdata,
output [1: 0] lacc_data_size,
input lacc_drsp_valid,
input [31: 0] lacc_drsp_rdata
);
lacc_demo demo(
.clk(clk),
.reset(reset),
.lacc_flush (lacc_flush),
.lacc_req_valid (lacc_req_valid),
.lacc_req_command(lacc_req_command),
.lacc_req_imm (lacc_req_imm),
.lacc_req_rj (lacc_req_rj),
.lacc_req_rk (lacc_req_rk),
.lacc_rsp_valid (lacc_rsp_valid),
.lacc_rsp_rdat (lacc_rsp_rdat),
.lacc_data_valid(lacc_data_valid),
.lacc_data_ready(lacc_data_ready),
.lacc_data_addr (lacc_data_addr),
.lacc_data_read (lacc_data_read),
.lacc_data_wdata(lacc_data_wdata),
.lacc_data_size (lacc_data_size),
.lacc_drsp_valid(lacc_drsp_valid),
.lacc_drsp_rdata(lacc_drsp_rdata)
);
endmodule
`endif

View File

@@ -0,0 +1,166 @@
module lacc_demo(
input clk,
input reset,
input lacc_flush,
input lacc_req_valid,
input [`LACC_OP_WIDTH-1: 0] lacc_req_command,
input [6: 0] lacc_req_imm,
input [31: 0] lacc_req_rj,
input [31: 0] lacc_req_rk,
output lacc_rsp_valid,
output [31: 0] lacc_rsp_rdat,
output lacc_data_valid,
input lacc_data_ready,
output [31: 0] lacc_data_addr,
output lacc_data_read,
output [31: 0] lacc_data_wdata,
output [1: 0] lacc_data_size,
input lacc_drsp_valid,
input [31: 0] lacc_drsp_rdata
);
wire op_lmadd;
wire op_cfg;
reg [31: 0] req_addr1, req_addr2, waddr;
reg [6: 0] req_size;
wire [31: 0] nxt_req_addr1, nxt_req_addr2, req_addr1_n4, req_addr2_n4;
wire [31: 0] nxt_waddr, waddr_n4;
wire [6: 0] nxt_req_size, req_size_p1;
wire req_addr1_en, req_addr2_en, req_size_en, waddr_en;
wire req_size_nz = |req_size;
reg [31: 0] conv_data;
wire [31: 0] add_data;
/*********************************
* func:
* rj and rk are addr
* read from rj and rk and add them
* write back to waddr
* then write add res to rd
**********************************/
assign op_lmadd = lacc_req_command == 0;
// rj is size of read, rk is waddr
assign op_cfg = lacc_req_command == 1;
parameter FSM_WIDTH = 2;
parameter IDLE = 'b0;
parameter REQ_ADDR1 = 'b1;
parameter REQ_ADDR2 = 'd2;
parameter FINAL = 'd3;
reg [FSM_WIDTH-1: 0] state_r;
wire [FSM_WIDTH-1: 0] state_n;
wire [FSM_WIDTH-1: 0] state_final_n;
wire state_idle = state_r == IDLE;
wire state_req_addr1 = state_r == REQ_ADDR1;
wire state_req_addr2 = state_r == REQ_ADDR2;
wire state_final = state_r == FINAL;
wire data_hsk = lacc_data_valid & lacc_data_ready;
wire idle_exit = state_idle & lacc_req_valid & op_lmadd & req_size_nz;
wire req_addr1_exit = state_req_addr1 & data_hsk;
wire req_addr2_exit = state_req_addr2 & data_hsk;
wire final_exit = state_final & data_hsk;
wire exit2idle = final_exit & ~req_size_nz;
assign state_final_n = ~req_size_nz ? IDLE : REQ_ADDR1;
wire state_en = idle_exit | req_addr1_exit | req_addr2_exit | final_exit;
assign state_n = {FSM_WIDTH{idle_exit}} & REQ_ADDR1 |
{FSM_WIDTH{req_addr1_exit}} & REQ_ADDR2 |
{FSM_WIDTH{req_addr2_exit}} & FINAL |
{FSM_WIDTH{final_exit}} & state_final_n;
assign req_addr1_en = state_idle & lacc_req_valid & op_lmadd | req_addr1_exit;
assign req_addr2_en = state_idle & lacc_req_valid & op_lmadd | req_addr2_exit;
assign req_size_en = lacc_req_valid & op_cfg | req_addr2_exit;
assign waddr_en = lacc_req_valid & op_cfg | final_exit;
assign req_addr1_n4 = req_addr1 + 4;
assign req_addr2_n4 = req_addr2 + 4;
assign req_size_p1 = req_size - 1;
assign waddr_n4 = waddr + 4;
assign nxt_req_addr1 = {32{lacc_req_valid & state_idle & op_lmadd}} & lacc_req_rj |
{32{req_addr1_exit}} & req_addr1_n4;
assign nxt_req_addr2 = {32{lacc_req_valid & state_idle & op_lmadd}} & lacc_req_rk |
{32{req_addr2_exit}} & req_addr2_n4;
assign nxt_req_size = {7{lacc_req_valid & op_cfg}} & lacc_req_rj |
{7{req_addr2_exit}} & req_size_p1;
assign nxt_waddr = {32{lacc_req_valid & op_cfg}} & lacc_req_rk |
{32{final_exit}} & waddr_n4;
reg data_req;
wire data_req_en;
wire nxt_data_req;
assign data_req_en = idle_exit | req_addr1_exit | req_addr2_exit | final_exit;
assign nxt_data_req = ~exit2idle & ~req_addr2_exit;
reg buffer_valid, wdata_valid, wdata_valid_n;
reg [31: 0] buffer_data;
reg [31: 0] buffer_wdata;
always @(posedge clk)begin
if(req_addr1_en) req_addr1 <= nxt_req_addr1;
if(req_addr2_en) req_addr2 <= nxt_req_addr2;
if(waddr_en) waddr <= nxt_waddr;
if(~buffer_valid) buffer_data <= lacc_drsp_rdata;
if(buffer_valid & lacc_drsp_valid) begin
buffer_wdata <= lacc_drsp_rdata + buffer_data;
end
wdata_valid_n <= wdata_valid;
if(reset | lacc_flush)begin
req_size <= 7'b0;
state_r <= IDLE;
data_req <= 1'b0;
buffer_valid <= 1'b0;
wdata_valid <= 1'b0;
end
else begin
if(req_size_en) req_size <= nxt_req_size;
if(state_en) state_r <= state_n;
if(data_req_en) data_req <= nxt_data_req;
if(final_exit) buffer_valid <= 1'b0;
else if(lacc_drsp_valid & ~wdata_valid_n) buffer_valid <= 1'b1;
if(final_exit) wdata_valid <= 1'b0;
else if(buffer_valid & lacc_drsp_valid) wdata_valid <= 1'b1;
end
end
assign lacc_data_valid = data_req | wdata_valid;
assign lacc_data_read = ~wdata_valid;
assign lacc_data_addr = {32{state_req_addr1}} & req_addr1 |
{32{state_req_addr2}} & req_addr2 |
{32{state_final}} & waddr;
assign lacc_data_size = 2'b10;
assign lacc_data_wdata = buffer_wdata;
assign add_data = conv_data + lacc_drsp_rdata;
always @(posedge clk)begin
if(idle_exit)begin
conv_data <= 0;
end
else begin
if(lacc_drsp_valid)begin
conv_data <= add_data;
end
end
end
assign lacc_rsp_valid = exit2idle | lacc_req_valid & state_idle & ((op_lmadd & ~req_size_nz) | op_cfg);
assign lacc_rsp_rdat = conv_data;
endmodule

View File

@@ -0,0 +1,368 @@
`include "mycpu.h"
`include "csr.h"
module mem_stage(
input clk ,
input reset ,
//allowin
input ws_allowin ,
output ms_allowin ,
//from es
input es_to_ms_valid,
input [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus ,
//to ws
output ms_to_ws_valid,
output [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus ,
//to ds forward path
output [`MS_TO_DS_FORWARD_BUS-1:0] ms_to_ds_forward_bus,
output ms_to_ds_valid,
//div mul
input [31:0] div_result ,
input [31:0] mod_result ,
input [63:0] mul_result ,
//exception
input excp_flush ,
input ertn_flush ,
input refetch_flush ,
input icacop_flush ,
//idle
input idle_flush ,
//tlb ins
output tlb_inst_stall,
//to es
output ms_wr_tlbehi ,
output ms_flush ,
//from cache
input data_data_ok ,
input dcache_miss ,
input [31:0] data_rdata ,
//to cache
output data_uncache_en,
output tlb_excp_cancel_req,
output sc_cancel_req ,
//from csr
input csr_pg ,
input csr_da ,
input [31:0] csr_dmw0 ,
input [31:0] csr_dmw1 ,
input [ 1:0] csr_plv ,
input [ 1:0] csr_datm ,
input disable_cache ,
input [27:0] lladdr ,
// from addr trans for difftest
input [ 7:0] data_index_diff ,
input [19:0] data_tag_diff ,
input [ 3:0] data_offset_diff ,
//to addr trans
output data_addr_trans_en,
output dmw0_en ,
output dmw1_en ,
output cacop_op_mode_di ,
//tlb
input data_tlb_found ,
input [ 4:0] data_tlb_index ,
input data_tlb_v ,
input data_tlb_d ,
input [ 1:0] data_tlb_mat ,
input [ 1:0] data_tlb_plv ,
input [19:0] data_tlb_ppn
);
reg ms_valid;
wire ms_ready_go;
wire dep_need_stall;
reg [`ES_TO_MS_BUS_WD -1:0] es_to_ms_bus_r;
wire [ 3:0] ms_mul_div_op;
wire [ 1:0] sram_addr_low2bit;
wire [ 1:0] ms_mem_size;
wire ms_load_op;
wire ms_gr_we;
wire [ 4:0] ms_dest;
wire [31:0] ms_exe_result;
wire [31:0] ms_pc;
wire ms_excp;
wire [ 9:0] ms_excp_num;
wire ms_ertn;
wire [31:0] ms_csr_result;
wire [13:0] ms_csr_idx;
wire ms_csr_we;
wire ms_ll_w;
wire ms_sc_w;
wire ms_store_op;
wire ms_tlbsrch;
wire ms_tlbfill;
wire ms_tlbwr;
wire ms_tlbrd;
wire ms_refetch;
wire ms_invtlb;
wire [ 9:0] ms_invtlb_asid;
wire [18:0] ms_invtlb_vpn;
wire ms_mem_sign_exted;
wire ms_icacop_op_en;
wire ms_br_inst;
wire ms_icache_miss;
wire ms_br_pre;
wire ms_br_pre_error;
wire ms_preld_inst;
wire ms_cacop;
wire ms_idle;
wire [31:0] ms_error_va;
// difftest
wire ms_cnt_inst ;
wire [63:0] ms_timer_64 ;
wire [31:0] ms_inst ;
wire [ 7:0] ms_inst_ld_en ;
wire [31:0] ms_ld_paddr ;
wire [31:0] ms_ld_vaddr ;
wire [ 7:0] ms_inst_st_en ;
wire [31:0] ms_st_data ;
wire ms_csr_rstat_en ;
wire [31:0] ms_csr_data ;
assign {ms_csr_data , //424:393 for difftest
ms_csr_rstat_en , //392:392 for difftest
ms_st_data , //391:360 for difftest
ms_inst_st_en , //359:352 for difftest
ms_ld_vaddr , //351:320 for difftest
ms_inst_ld_en , //319:312 for difftest
ms_cnt_inst , //311:311 for difftest
ms_timer_64 , //310:247 for difftest
ms_inst , //246:215 for difftest
ms_error_va , //214:183
ms_idle , //182:182
ms_cacop , //181:181
ms_preld_inst , //180:180
ms_br_pre_error , //179:179
ms_br_pre , //178:178
ms_icache_miss , //177:177
ms_br_inst , //176:176
ms_icacop_op_en , //175:175
ms_mem_sign_exted, //174:174
ms_invtlb_vpn , //173:155
ms_invtlb_asid , //154:145
ms_invtlb , //144:144
ms_tlbrd , //143:143
ms_refetch , //142:142
ms_tlbfill , //141:141
ms_tlbwr , //140:140
ms_tlbsrch , //139:139
ms_store_op , //138:138
ms_sc_w , //137:137
ms_ll_w , //136:136
ms_excp_num , //135:126
ms_csr_we , //125:125
ms_csr_idx , //124:111
ms_csr_result , //110:79
ms_ertn , //78:78
ms_excp , //77:77
ms_mem_size , //76:75
ms_mul_div_op , //74:71
ms_load_op , //70:70
ms_gr_we , //69:69
ms_dest , //68:64
ms_exe_result , //63:32
ms_pc //31:0
} = es_to_ms_bus_r;
wire [31:0] mem_result;
wire [31:0] ms_final_result;
wire flush_sign;
wire [31:0] ms_rdata;
reg [31:0] data_rd_buff;
reg data_buff_enable;
wire access_mem;
wire [ 4:0] cacop_op;
wire [ 1:0] cacop_op_mode;
wire forward_enable;
wire dest_zero;
wire [31:0] paddr;
wire [15:0] excp_num;
wire excp;
wire excp_tlbr;
wire excp_pil ;
wire excp_pis ;
wire excp_pme ;
wire excp_ppi ;
wire da_mode ;
wire pg_mode ;
wire sc_addr_eq;
assign ms_to_ws_bus = {ms_csr_data , //492:461 for difftest
ms_csr_rstat_en, //460:460 for difftest
ms_st_data , //459:428 for difftest
ms_inst_st_en , //427:420 for difftest
ms_ld_vaddr , //419:388 for difftest
ms_ld_paddr , //387:356 for difftest
ms_inst_ld_en , //355:348 for difftest
ms_cnt_inst , //347:347 for difftest
ms_timer_64 , //346:283 for difftest
ms_inst , //282:251 for difftest
data_uncache_en, //250:250
paddr , //249:218
ms_idle , //217:217
ms_br_pre_error, //216:216
ms_br_pre , //215:215
dcache_miss , //214:214
access_mem , //213:213
ms_icache_miss , //212:212
ms_br_inst , //211:211
ms_icacop_op_en, //210:210
ms_invtlb_vpn , //209:191
ms_invtlb_asid , //190:181
ms_invtlb , //180:180
ms_tlbrd , //179:179
ms_refetch , //178:178
ms_tlbfill , //177:177
ms_tlbwr , //176:176
data_tlb_index , //175:171
data_tlb_found , //170:170
ms_tlbsrch , //169:169
ms_error_va , //168:137
ms_sc_w , //136:136
ms_ll_w , //135:135
excp_num , //134:119
ms_csr_we , //118:118
ms_csr_idx , //117:104
ms_csr_result , //103:72
ms_ertn , //71:71
excp , //70:70
ms_gr_we , //69:69
ms_dest , //68:64
ms_final_result, //63:32
ms_pc //31:0
};
assign ms_to_ds_valid = ms_valid;
//cache inst need wait data_data_ok signal
assign ms_ready_go = (data_data_ok || data_buff_enable) || !access_mem || excp || sc_cancel_req;
assign ms_allowin = !ms_valid || ms_ready_go && ws_allowin;
assign ms_to_ws_valid = ms_valid && ms_ready_go;
always @(posedge clk) begin
if (reset || flush_sign) begin
ms_valid <= 1'b0;
end
else if (ms_allowin) begin
ms_valid <= es_to_ms_valid;
end
if (es_to_ms_valid && ms_allowin) begin
es_to_ms_bus_r <= es_to_ms_bus;
end
end
assign access_mem = ms_store_op || ms_load_op;
assign flush_sign = excp_flush || ertn_flush || refetch_flush || icacop_flush || idle_flush;
assign ms_rdata = data_buff_enable ? data_rd_buff : data_rdata;
assign sram_addr_low2bit = {ms_exe_result[1], ms_exe_result[0]};
wire [7:0] mem_byteLoaded = ({8{sram_addr_low2bit==2'b00}} & ms_rdata[ 7: 0]) |
({8{sram_addr_low2bit==2'b01}} & ms_rdata[15: 8]) |
({8{sram_addr_low2bit==2'b10}} & ms_rdata[23:16]) |
({8{sram_addr_low2bit==2'b11}} & ms_rdata[31:24]) ;
wire [15:0] mem_halfLoaded = ({16{sram_addr_low2bit==2'b00}} & ms_rdata[15: 0]) |
({16{sram_addr_low2bit==2'b10}} & ms_rdata[31:16]) ;
assign mem_result = ({32{ms_mem_size[0] && ms_mem_sign_exted}} & {{24{mem_byteLoaded[ 7]}}, mem_byteLoaded}) |
({32{ms_mem_size[0] && ~ms_mem_sign_exted}} & { 24'b0 , mem_byteLoaded}) |
({32{ms_mem_size[1] && ms_mem_sign_exted}} & {{16{mem_halfLoaded[15]}}, mem_halfLoaded}) |
({32{ms_mem_size[1] && ~ms_mem_sign_exted}} & { 16'b0 , mem_halfLoaded}) |
({32{!ms_mem_size}} & ms_rdata ) ;
assign ms_final_result = ({32{ms_load_op }} & mem_result ) |
({32{ms_mul_div_op[0]}} & mul_result[31:0] ) |
({32{ms_mul_div_op[1]}} & mul_result[63:32]) |
({32{ms_mul_div_op[2]}} & div_result ) |
({32{ms_mul_div_op[3]}} & mod_result ) |
({32{!ms_mul_div_op && !ms_load_op}} & (ms_exe_result&{32{!sc_cancel_req}}));
assign dest_zero = (ms_dest == 5'b0);
assign forward_enable = ms_gr_we & ~dest_zero & ms_valid;
assign dep_need_stall = ms_load_op && !ms_to_ws_valid;
assign ms_to_ds_forward_bus = {dep_need_stall, //38:38
forward_enable, //37:37
ms_dest , //36:32
ms_final_result //31:0
};
//addr trans
assign pg_mode = !csr_da && csr_pg;
//uncache judgement
assign da_mode = csr_da && !csr_pg;
assign data_addr_trans_en = pg_mode && !dmw0_en && !dmw1_en && !cacop_op_mode_di;
assign paddr = {data_tlb_ppn, ms_error_va[11:0]};
assign sc_addr_eq = (lladdr == paddr[31:4]);
assign sc_cancel_req = (!sc_addr_eq||data_uncache_en) && ms_sc_w && access_mem;
//addr dmw trans
assign dmw0_en = ((csr_dmw0[`PLV0] && csr_plv == 2'd0) || (csr_dmw0[`PLV3] && csr_plv == 2'd3)) && (ms_error_va[31:29] == csr_dmw0[`VSEG]) && pg_mode;
assign dmw1_en = ((csr_dmw1[`PLV0] && csr_plv == 2'd0) || (csr_dmw1[`PLV3] && csr_plv == 2'd3)) && (ms_error_va[31:29] == csr_dmw1[`VSEG]) && pg_mode;
assign excp = excp_tlbr || excp_pil || excp_pis || excp_ppi || excp_pme || ms_excp;
assign excp_num = {excp_pil, excp_pis, excp_ppi, excp_pme, excp_tlbr, 1'b0, ms_excp_num};
//tlb exception //preld should not generate these excp
assign excp_tlbr = (access_mem || ms_cacop) && !data_tlb_found && data_addr_trans_en;
assign excp_pil = (ms_load_op || ms_cacop) && !data_tlb_v && data_addr_trans_en; //cache will generate pil exception??
assign excp_pis = ms_store_op && !data_tlb_v && data_addr_trans_en;
assign excp_ppi = access_mem && data_tlb_v && (csr_plv > data_tlb_plv) && data_addr_trans_en;
assign excp_pme = ms_store_op && data_tlb_v && (csr_plv <= data_tlb_plv) && !data_tlb_d && data_addr_trans_en;
assign tlb_excp_cancel_req = excp_tlbr || excp_pil || excp_pis || excp_ppi || excp_pme;
assign data_uncache_en = (da_mode && (csr_datm == 2'b0)) ||
(dmw0_en && (csr_dmw0[`DMW_MAT] == 2'b0)) ||
(dmw1_en && (csr_dmw1[`DMW_MAT] == 2'b0)) ||
(data_addr_trans_en && (data_tlb_mat == 2'b0)) ||
disable_cache;
assign ms_flush = (excp | ms_ertn | (ms_csr_we | (ms_ll_w | ms_sc_w) & !excp) | ms_refetch | ms_idle) & ms_valid;
assign tlb_inst_stall = (ms_tlbsrch || ms_tlbrd) && ms_valid;
always @(posedge clk) begin
if (reset || (ms_ready_go && ws_allowin) || flush_sign) begin
data_rd_buff <= 32'b0;
data_buff_enable <= 1'b0;
end
else if (data_data_ok && !ws_allowin) begin
data_rd_buff <= data_rdata;
data_buff_enable <= 1'b1;
end
end
assign ms_wr_tlbehi = ms_csr_we && (ms_csr_idx == 14'h11) && ms_valid; //stall es tlbsrch
assign cacop_op = ms_dest;
assign cacop_op_mode = cacop_op[4:3];
assign cacop_op_mode_di = ms_cacop && ((cacop_op_mode == 2'b0) || (cacop_op_mode == 2'b1));
reg [ 7:0] tmp_data_index ;
reg [ 3:0] tmp_data_offset ;
always @(posedge clk) begin
tmp_data_index <= data_index_diff;
tmp_data_offset <= data_offset_diff;
end
assign ms_ld_paddr = {data_tag_diff, tmp_data_index, tmp_data_offset};
endmodule

197
rtl/ip/open-la500/mul.v Normal file
View File

@@ -0,0 +1,197 @@
module YDecoder(
input yc, yb, ya, //c -> i+1; b -> i; a -> i-1
output negx, x, neg2x, _2x
);
assign negx = (yc & yb & ~ya) | (yc & ~yb & ya);
assign x = (~yc & ~yb & ya) | (~yc & yb & ~ya);
assign neg2x = (yc & ~yb & ~ya);
assign _2x = (~yc & yb & ya);
endmodule
module BoothBase(
input negx, x, neg2x, _2x,
input InX,
input PosLastX, NegLastX,
output PosNextX, NegNextX,
output OutX
);
assign OutX = (negx & ~InX) | (x & InX) | (neg2x & NegLastX) | (_2x & PosLastX);
assign PosNextX = InX;
assign NegNextX = ~InX;
endmodule
module BoothInterBase(
input [2:0] y,
input [63:0] InX,
output [63:0] OutX,
output Carry
);
wire negx, x, neg2x, _2x;
wire [1:0] CarrySig [64:0];
YDecoder uu(.yc(y[2]), .yb(y[1]), .ya(y[0]), .negx(negx), .x(x), .neg2x(neg2x), ._2x(_2x));
BoothBase fir(.negx(negx), .x(x), .neg2x(neg2x), ._2x(_2x), .InX(InX[0]), .PosLastX(1'b0), .NegLastX(1'b1), .PosNextX(CarrySig[1][0]), .NegNextX(CarrySig[1][1]), .OutX(OutX[0]));
generate
genvar i;
for (i=1; i<64; i=i+1) begin: gfor
BoothBase ui(
.negx(negx),
.x(x),
.neg2x(neg2x),
._2x(_2x),
.InX(InX[i]),
.PosLastX(CarrySig[i][0]),
.NegLastX(CarrySig[i][1]),
.PosNextX(CarrySig[i+1][0]),
.NegNextX(CarrySig[i+1][1]),
.OutX(OutX[i])
);
end
endgenerate
assign Carry = negx || neg2x;
endmodule
module addr(
input A, B, C,
output Carry, S
);
assign S = ~A & ~B & C | ~A & B & ~C | A & ~B & ~C | A & B & C;
assign Carry = A & B | A & C | B & C;
endmodule
module WallaceTreeBase(
input [16:0] InData,
input [13:0] CIn,
output [13:0] COut,
output C, S
);
//first stage
wire [4:0] FirSig;
addr first1(.A(InData[4]), .B(InData[3]), .C(InData[2]), .Carry(COut[0]), .S(FirSig[0]));
addr first2(.A(InData[7]), .B(InData[6]), .C(InData[5]), .Carry(COut[1]), .S(FirSig[1]));
addr first3(.A(InData[10]), .B(InData[9]), .C(InData[8]), .Carry(COut[2]), .S(FirSig[2]));
addr first4(.A(InData[13]), .B(InData[12]), .C(InData[11]), .Carry(COut[3]), .S(FirSig[3]));
addr first5(.A(InData[16]), .B(InData[15]), .C(InData[14]), .Carry(COut[4]), .S(FirSig[4]));
//second stage
wire [3:0] SecSig;
addr second1(.A(CIn[2]), .B(CIn[1]), .C(CIn[0]), .Carry(COut[5]), .S(SecSig[0]));
addr second2(.A(InData[0]), .B(CIn[4]), .C(CIn[3]), .Carry(COut[6]), .S(SecSig[1]));
addr second3(.A(FirSig[1]), .B(FirSig[0]), .C(InData[1]), .Carry(COut[7]), .S(SecSig[2]));
addr second4(.A(FirSig[4]), .B(FirSig[3]), .C(FirSig[2]), .Carry(COut[8]), .S(SecSig[3]));
//third stage
wire [1:0] ThiSig;
addr third1(.A(SecSig[0]), .B(CIn[6]), .C(CIn[5]), .Carry(COut[9]), .S(ThiSig[0]));
addr third2(.A(SecSig[3]), .B(SecSig[2]), .C(SecSig[1]), .Carry(COut[10]), .S(ThiSig[1]));
//fourth stage
wire [1:0] ForSig;
addr fourth1(.A(CIn[9]), .B(CIn[8]), .C(CIn[7]), .Carry(COut[11]), .S(ForSig[0]));
addr fourth2(.A(ThiSig[1]), .B(ThiSig[0]), .C(CIn[10]), .Carry(COut[12]), .S(ForSig[1]));
//fifth stage
wire FifSig;
addr fifth1(.A(ForSig[1]), .B(ForSig[0]), .C(CIn[11]), .Carry(COut[13]), .S(FifSig));
//sixth stage
addr sixth1(.A(FifSig), .B(CIn[13]), .C(CIn[12]), .Carry(C), .S(S));
endmodule
//-------------------------------------------------------------------------------------------------------------------
module mul(
input mul_clk, reset,
input mul_signed,
input [31:0] x, y, //x扩展至64位 y扩展至33位 区别有无符号
output [63:0] result
);
wire [63:0] CalX;
wire [32:0] CalY;
assign CalX = mul_signed ? {{32{x[31]}}, x} : {32'b0, x};
assign CalY = mul_signed ? {y[31], y} : {1'b0, y};
//booth
wire [16:0] Carry; //booth计算得到的进位
wire [63:0] BoothRes [16:0]; //booth的计算结果
BoothInterBase fir(.y({CalY[1], CalY[0], 1'b0}), .InX(CalX), .OutX(BoothRes[0]), .Carry(Carry[0]));
generate
genvar i;
for (i=2; i<32; i=i+2) begin: boothfor
BoothInterBase ai(
.y(CalY[i+1:i-1]),
.InX(CalX<<i),
.OutX(BoothRes[i>>1]),
.Carry(Carry[i>>1])
);
end
endgenerate
BoothInterBase las(.y({CalY[32], CalY[32], CalY[31]}), .InX(CalX<<32), .OutX(BoothRes[16]), .Carry(Carry[16]));
reg [16:0] SecStageCarry;
reg [63:0] SecStageBoothRes [16:0];
integer p;
always @(posedge mul_clk) begin
if (~reset) begin
SecStageCarry <= Carry;
for(p=0; p<17; p=p+1) begin
SecStageBoothRes[p] <= BoothRes[p];
end
end
end
//wallace
wire [13:0] WallaceInter [64:0];
wire [63:0] COut, SOut;
WallaceTreeBase firs(
.InData({SecStageBoothRes[0][0], SecStageBoothRes[1][0], SecStageBoothRes[2][0], SecStageBoothRes[3][0], SecStageBoothRes[4][0], SecStageBoothRes[5][0], SecStageBoothRes[6][0],
SecStageBoothRes[7][0], SecStageBoothRes[8][0], SecStageBoothRes[9][0], SecStageBoothRes[10][0], SecStageBoothRes[11][0], SecStageBoothRes[12][0], SecStageBoothRes[13][0], SecStageBoothRes[14][0],
SecStageBoothRes[15][0], SecStageBoothRes[16][0]}),
.CIn(SecStageCarry[13:0]),
.COut(WallaceInter[1]),
.C(COut[0]),
.S(SOut[0])
);
generate
genvar n;
for (n=1; n<64; n=n+1) begin: wallacefor
WallaceTreeBase bi(
.InData({SecStageBoothRes[0][n], SecStageBoothRes[1][n], SecStageBoothRes[2][n], SecStageBoothRes[3][n], SecStageBoothRes[4][n], SecStageBoothRes[5][n], SecStageBoothRes[6][n],
SecStageBoothRes[7][n], SecStageBoothRes[8][n], SecStageBoothRes[9][n], SecStageBoothRes[10][n], SecStageBoothRes[11][n], SecStageBoothRes[12][n], SecStageBoothRes[13][n], SecStageBoothRes[14][n],
SecStageBoothRes[15][n], SecStageBoothRes[16][n]}),
.CIn(WallaceInter[n]),
.COut(WallaceInter[n+1]),
.C(COut[n]),
.S(SOut[n])
);
end
endgenerate
//64bit add
assign result = SOut + {COut[62:0], SecStageCarry[14]} + SecStageCarry[15];
endmodule

32
rtl/ip/open-la500/mycpu.h Normal file
View File

@@ -0,0 +1,32 @@
`ifndef MYCPU_H
`define MYCPU_H
// `define BR_BUS_WD 33 //bug5 32->33
// `define FS_TO_DS_BUS_WD 109
// `define DS_TO_ES_BUS_WD 236
// `define ES_TO_MS_BUS_WD 215
// `define MS_TO_WS_BUS_WD 218
// `define WS_TO_RF_BUS_WD 38
// `define ES_TO_DS_FORWARD_BUS 39
// `define MS_TO_DS_FORWARD_BUS 39
// `define HAS_LACC
`define LACC_OP_SIZE 3
`define LACC_OP_WIDTH $clog2(`LACC_OP_SIZE)
`define BR_BUS_WD 33 //bug5 32->33
`define FS_TO_DS_BUS_WD 109
`define DS_TO_ES_BUS_WD (350 \
`ifdef HAS_LACC \
+`LACC_OP_WIDTH+1 \
`endif \
)
`define ES_TO_MS_BUS_WD 425
`define MS_TO_WS_BUS_WD 493
`define WS_TO_RF_BUS_WD 38
`define ES_TO_DS_FORWARD_BUS 39
`define MS_TO_DS_FORWARD_BUS 39
`endif
//`define SIMU

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,57 @@
module perf_counter (
input clk ,
input reset ,
input dcache_miss ,
input icache_miss ,
input commit_inst ,
input br_inst ,
input mem_inst ,
input br_pre ,
input br_pre_error
);
reg[31:0] dcache_miss_counter;
reg[31:0] icache_miss_counter;
reg[31:0] commit_inst_counter;
reg[31:0] br_inst_counter;
reg[31:0] mem_inst_counter;
reg[31:0] br_pre_counter;
reg[31:0] br_pre_error_counter;
always @(posedge clk) begin
if (reset) begin
dcache_miss_counter <= 32'b0;
icache_miss_counter <= 32'b0;
commit_inst_counter <= 32'b0;
br_inst_counter <= 32'b0;
mem_inst_counter <= 32'b0;
br_pre_counter <= 32'b0;
br_pre_error_counter <= 32'b0;
end
else begin
if (dcache_miss) begin
dcache_miss_counter <= dcache_miss_counter + 32'b1;
end
if (icache_miss) begin
icache_miss_counter <= icache_miss_counter + 32'b1;
end
if (commit_inst) begin
commit_inst_counter <= commit_inst_counter + 32'b1;
end
if (br_inst) begin
br_inst_counter <= br_inst_counter + 32'b1;
end
if (mem_inst) begin
mem_inst_counter <= mem_inst_counter + 32'b1;
end
if (br_pre) begin
br_pre_counter <= br_pre_counter + 32'b1;
end
if (br_pre_error) begin
br_pre_error_counter <= br_pre_error_counter + 32'b1;
end
end
end
endmodule

View File

@@ -0,0 +1,39 @@
module regfile(
input clk,
// READ PORT 1
input [ 4:0] raddr1,
output [31:0] rdata1,
// READ PORT 2
input [ 4:0] raddr2,
output [31:0] rdata2,
// WRITE PORT
input we, //write enable, HIGH valid
input [ 4:0] waddr,
input [31:0] wdata
`ifdef DIFFTEST_EN
,
output [31:0] rf_o [31:0] // difftest
`endif
);
reg [31:0] rf[31:0];
//WRITE
always @(posedge clk) begin
if (we) rf[waddr]<= wdata;
end
//READ OUT 1
assign rdata1 = (raddr1==5'b0) ? 32'b0 :
((raddr1==waddr) && we) ? wdata :
rf[raddr1];
//READ OUT 2
assign rdata2 = (raddr2==5'b0) ? 32'b0 :
((raddr2==waddr) && we) ? wdata :
rf[raddr2];
// difftest
`ifdef DIFFTEST_EN
assign rf_o = rf;
`endif
endmodule

View File

@@ -0,0 +1,235 @@
module tlb_entry
#(
parameter TLBNUM = 32
)
(
input clk,
// search port 0
input s0_fetch ,
input [18:0] s0_vppn ,
input s0_odd_page ,
input [ 9:0] s0_asid ,
output s0_found ,
output [ 4:0] s0_index ,
output [ 5:0] s0_ps ,
output [19:0] s0_ppn ,
output s0_v ,
output s0_d ,
output [ 1:0] s0_mat ,
output [ 1:0] s0_plv ,
//search port 1
input s1_fetch ,
input [18:0] s1_vppn ,
input s1_odd_page ,
input [ 9:0] s1_asid ,
output s1_found ,
output [ 4:0] s1_index ,
output [ 5:0] s1_ps ,
output [19:0] s1_ppn ,
output s1_v ,
output s1_d ,
output [ 1:0] s1_mat ,
output [ 1:0] s1_plv ,
// write port
input we ,
input [$clog2(TLBNUM)-1:0] w_index ,
input [18:0] w_vppn ,
input [ 9:0] w_asid ,
input w_g ,
input [ 5:0] w_ps ,
input w_e ,
input w_v0 ,
input w_d0 ,
input [ 1:0] w_mat0 ,
input [ 1:0] w_plv0 ,
input [19:0] w_ppn0 ,
input w_v1 ,
input w_d1 ,
input [ 1:0] w_mat1 ,
input [ 1:0] w_plv1 ,
input [19:0] w_ppn1 ,
// read port
input [$clog2(TLBNUM)-1:0] r_index ,
output [18:0] r_vppn ,
output [ 9:0] r_asid ,
output r_g ,
output [ 5:0] r_ps ,
output r_e ,
output r_v0 ,
output r_d0 ,
output [ 1:0] r_mat0 ,
output [ 1:0] r_plv0 ,
output [19:0] r_ppn0 ,
output r_v1 ,
output r_d1 ,
output [ 1:0] r_mat1 ,
output [ 1:0] r_plv1 ,
output [19:0] r_ppn1 ,
// invalid port
input inv_en ,
input [ 4:0] inv_op ,
input [ 9:0] inv_asid ,
input [18:0] inv_vpn
);
reg [18:0] tlb_vppn [TLBNUM-1:0];
reg tlb_e [TLBNUM-1:0];
reg [ 9:0] tlb_asid [TLBNUM-1:0];
reg tlb_g [TLBNUM-1:0];
reg [ 5:0] tlb_ps [TLBNUM-1:0];
reg [19:0] tlb_ppn0 [TLBNUM-1:0];
reg [ 1:0] tlb_plv0 [TLBNUM-1:0];
reg [ 1:0] tlb_mat0 [TLBNUM-1:0];
reg tlb_d0 [TLBNUM-1:0];
reg tlb_v0 [TLBNUM-1:0];
reg [19:0] tlb_ppn1 [TLBNUM-1:0];
reg [ 1:0] tlb_plv1 [TLBNUM-1:0];
reg [ 1:0] tlb_mat1 [TLBNUM-1:0];
reg tlb_d1 [TLBNUM-1:0];
reg tlb_v1 [TLBNUM-1:0];
reg s0_fetch_r ;
reg [18:0] s0_vppn_r ;
reg s0_odd_page_r;
reg [ 9:0] s0_asid_r ;
reg s1_fetch_r ;
reg [18:0] s1_vppn_r ;
reg s1_odd_page_r;
reg [ 9:0] s1_asid_r ;
wire [TLBNUM-1:0] match0;
wire [TLBNUM-1:0] match1;
wire [$clog2(TLBNUM)-1:0] match0_en;
wire [$clog2(TLBNUM)-1:0] match1_en;
wire [TLBNUM-1:0] s0_odd_page_buffer;
wire [TLBNUM-1:0] s1_odd_page_buffer;
always @(posedge clk) begin
s0_fetch_r <= s0_fetch;
if (s0_fetch) begin
s0_vppn_r <= s0_vppn;
s0_odd_page_r <= s0_odd_page;
s0_asid_r <= s0_asid;
end
s1_fetch_r <= s1_fetch;
if (s1_fetch) begin
s1_vppn_r <= s1_vppn;
s1_odd_page_r <= s1_odd_page;
s1_asid_r <= s1_asid;
end
end
genvar i;
generate
for (i = 0; i < TLBNUM; i = i + 1)
begin: match
assign s0_odd_page_buffer[i] = (tlb_ps[i] == 6'd12) ? s0_odd_page_r : s0_vppn_r[8];
assign match0[i] = tlb_e[i] && ((tlb_ps[i] == 6'd12) ? s0_vppn_r == tlb_vppn[i] : s0_vppn_r[18: 9] == tlb_vppn[i][18: 9]) && ((s0_asid_r == tlb_asid[i]) || tlb_g[i]);
assign s1_odd_page_buffer[i] = (tlb_ps[i] == 6'd12) ? s1_odd_page_r : s1_vppn_r[8];
assign match1[i] = tlb_e[i] && ((tlb_ps[i] == 6'd12) ? s1_vppn_r == tlb_vppn[i] : s1_vppn_r[18: 9] == tlb_vppn[i][18: 9]) && ((s1_asid_r == tlb_asid[i]) || tlb_g[i]);
end
endgenerate
encoder_32_5 en_match0 (.in({{(32-TLBNUM){1'b0}},match0}), .out(match0_en));
encoder_32_5 en_match1 (.in({{(32-TLBNUM){1'b0}},match1}), .out(match1_en));
assign s0_found = |match0;
assign s0_index = {{(5-$clog2(TLBNUM)){1'b0}},match0_en};
assign s0_ps = tlb_ps[match0_en];
assign s0_ppn = s0_odd_page_buffer[match0_en] ? tlb_ppn1[match0_en] : tlb_ppn0[match0_en];
assign s0_v = s0_odd_page_buffer[match0_en] ? tlb_v1[match0_en] : tlb_v0[match0_en] ;
assign s0_d = s0_odd_page_buffer[match0_en] ? tlb_d1[match0_en] : tlb_d0[match0_en] ;
assign s0_mat = s0_odd_page_buffer[match0_en] ? tlb_mat1[match0_en] : tlb_mat0[match0_en];
assign s0_plv = s0_odd_page_buffer[match0_en] ? tlb_plv1[match0_en] : tlb_plv0[match0_en];
assign s1_found = |match1;
assign s1_index = {{(5-$clog2(TLBNUM)){1'b0}},match1_en};
assign s1_ps = tlb_ps[match1_en];
assign s1_ppn = s1_odd_page_buffer[match1_en] ? tlb_ppn1[match1_en] : tlb_ppn0[match1_en];
assign s1_v = s1_odd_page_buffer[match1_en] ? tlb_v1[match1_en] : tlb_v0[match1_en] ;
assign s1_d = s1_odd_page_buffer[match1_en] ? tlb_d1[match1_en] : tlb_d0[match1_en] ;
assign s1_mat = s1_odd_page_buffer[match1_en] ? tlb_mat1[match1_en] : tlb_mat0[match1_en];
assign s1_plv = s1_odd_page_buffer[match1_en] ? tlb_plv1[match1_en] : tlb_plv0[match1_en];
always @(posedge clk) begin
if (we) begin
tlb_vppn [w_index] <= w_vppn;
tlb_asid [w_index] <= w_asid;
tlb_g [w_index] <= w_g;
tlb_ps [w_index] <= w_ps;
tlb_ppn0 [w_index] <= w_ppn0;
tlb_plv0 [w_index] <= w_plv0;
tlb_mat0 [w_index] <= w_mat0;
tlb_d0 [w_index] <= w_d0;
tlb_v0 [w_index] <= w_v0;
tlb_ppn1 [w_index] <= w_ppn1;
tlb_plv1 [w_index] <= w_plv1;
tlb_mat1 [w_index] <= w_mat1;
tlb_d1 [w_index] <= w_d1;
tlb_v1 [w_index] <= w_v1;
end
end
assign r_vppn = tlb_vppn [r_index];
assign r_asid = tlb_asid [r_index];
assign r_g = tlb_g [r_index];
assign r_ps = tlb_ps [r_index];
assign r_e = tlb_e [r_index];
assign r_v0 = tlb_v0 [r_index];
assign r_d0 = tlb_d0 [r_index];
assign r_mat0 = tlb_mat0 [r_index];
assign r_plv0 = tlb_plv0 [r_index];
assign r_ppn0 = tlb_ppn0 [r_index];
assign r_v1 = tlb_v1 [r_index];
assign r_d1 = tlb_d1 [r_index];
assign r_mat1 = tlb_mat1 [r_index];
assign r_plv1 = tlb_plv1 [r_index];
assign r_ppn1 = tlb_ppn1 [r_index];
//tlb entry invalid
generate
for (i = 0; i < TLBNUM; i = i + 1)
begin: invalid_tlb_entry
always @(posedge clk) begin
if (we && (w_index == i)) begin
tlb_e[i] <= w_e;
end
else if (inv_en) begin
if (inv_op == 5'd0 || inv_op == 5'd1) begin
tlb_e[i] <= 1'b0;
end
else if (inv_op == 5'd2) begin
if (tlb_g[i]) begin
tlb_e[i] <= 1'b0;
end
end
else if (inv_op == 5'd3) begin
if (!tlb_g[i]) begin
tlb_e[i] <= 1'b0;
end
end
else if (inv_op == 5'd4) begin
if (!tlb_g[i] && (tlb_asid[i] == inv_asid)) begin
tlb_e[i] <= 1'b0;
end
end
else if (inv_op == 5'd5) begin
if (!tlb_g[i] && (tlb_asid[i] == inv_asid) &&
((tlb_ps[i] == 6'd12) ? (tlb_vppn[i] == inv_vpn) : (tlb_vppn[i][18:9] == inv_vpn[18:9]))) begin
tlb_e[i] <= 1'b0;
end
end
else if (inv_op == 5'd6) begin
if ((tlb_g[i] || (tlb_asid[i] == inv_asid)) &&
((tlb_ps[i] == 6'd12) ? (tlb_vppn[i] == inv_vpn) : (tlb_vppn[i][18:9] == inv_vpn[18:9]))) begin
tlb_e[i] <= 1'b0;
end
end
end
end
end
endgenerate
endmodule

167
rtl/ip/open-la500/tools.v Normal file
View File

@@ -0,0 +1,167 @@
module decoder_2_4(
input [ 1:0] in,
output [ 3:0] out
);
genvar i;
generate for (i=0; i<4; i=i+1) begin : gen_for_dec_2_4
assign out[i] = (in == i);
end endgenerate
endmodule
module encoder_4_2(
input [3:0] in,
output [1:0] out
);
assign out = {2{in[0]}} & 2'd0 |
{2{in[1]}} & 2'd1 |
{2{in[2]}} & 2'd2 |
{2{in[3]}} & 2'd3 ;
endmodule
module decoder_4_16(
input [ 3:0] in,
output [15:0] out
);
genvar i;
generate for (i=0; i<16; i=i+1) begin : gen_for_dec_4_16
assign out[i] = (in == i);
end endgenerate
endmodule
module encoder_16_4(
input [15:0] in,
output [ 3:0] out
);
wire [1:0] out_0, out_1, out_2, out_3;
encoder_4_2 one (.in(in[ 3: 0]), .out(out_0));
encoder_4_2 two (.in(in[ 7: 4]), .out(out_1));
encoder_4_2 thr (.in(in[11: 8]), .out(out_2));
encoder_4_2 fou (.in(in[15:12]), .out(out_3));
assign out = {4{|in[ 3: 0]}} & {2'd0, out_0} |
{4{|in[ 7: 4]}} & {2'd1, out_1} |
{4{|in[11: 8]}} & {2'd2, out_2} |
{4{|in[15:12]}} & {2'd3, out_3} ;
endmodule
module decoder_5_32(
input [ 4:0] in,
output [31:0] out
);
genvar i;
generate for (i=0; i<32; i=i+1) begin : gen_for_dec_5_32
assign out[i] = (in == i);
end endgenerate
endmodule
module encoder_32_5(
input [31:0] in,
output [ 4:0] out
);
wire [3:0] out_0, out_1;
encoder_16_4 one (.in(in[15: 0]), .out(out_0));
encoder_16_4 two (.in(in[31:16]), .out(out_1));
assign out = {5{|in[15: 0]}} & {1'd0, out_0} |
{5{|in[31:16]}} & {1'd1, out_1} ;
endmodule
module decoder_6_64(
input [ 5:0] in,
output [63:0] out
);
genvar i;
generate
for (i=0; i<64; i=i+1)
begin : gen_for_dec_6_64 //bug7
assign out[i] = (in == i);
end
endgenerate
endmodule
module one_valid_n #(
parameter n = 16
)(
input [n-1:0] in,
output [n-1:0] out,
output nozero
);
wire [n-1:0] one_in;
assign one_in[0] = in[0];
genvar i;
generate
for (i=1; i<n; i=i+1)
begin: sel_one
assign one_in[i] = in[i] && ~|in[i-1:0];
end
endgenerate
assign out = one_in;
assign nozero = |out;
endmodule
module one_valid_16 (
input [15:0] in,
output [ 3:0] out_en
);
wire [15:0] one_in;
assign one_in[0] = in[0];
genvar i;
generate
for (i=1; i<16; i=i+1)
begin: sel_one
assign one_in[i] = in[i] && ~|in[i-1:0];
end
endgenerate
encoder_16_4 coder (.in(one_in), .out(out_en));
endmodule
module one_valid_32 (
input [31:0] in,
output [ 4:0] out_en
);
wire [31:0] one_in;
assign one_in[0] = in[0];
genvar i;
generate
for (i=1; i<32; i=i+1)
begin: sel_one
assign one_in[i] = in[i] && ~|in[i-1:0];
end
endgenerate
encoder_32_5 coder (.in(one_in), .out(out_en));
endmodule

View File

@@ -0,0 +1,324 @@
`include "mycpu.h"
`include "csr.h"
module wb_stage(
input clk ,
input reset ,
//allowin
output ws_allowin ,
//from ms
input ms_to_ws_valid ,
input [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus ,
//to rf: for write back
output [`WS_TO_RF_BUS_WD -1:0] ws_to_rf_bus ,
//to ds
output ws_to_ds_valid ,
//exception
output [31:0] csr_era ,
output [ 8:0] csr_esubcode ,
output [ 5:0] csr_ecode ,
output excp_flush ,
output ertn_flush ,
output refetch_flush ,
output icacop_flush ,
output csr_wr_en ,
output [13:0] wr_csr_addr ,
output [31:0] wr_csr_data ,
output va_error ,
output [31:0] bad_va ,
output excp_tlbrefill ,
output excp_tlb ,
output [18:0] excp_tlb_vppn ,
//idle
output idle_flush ,
//llbit
output ws_llbit_set ,
output ws_llbit ,
output ws_lladdr_set ,
output [27:0] ws_lladdr ,
//tlb ins
output tlb_inst_stall ,
output tlbsrch_en ,
output tlbsrch_found ,
output [ 4:0] tlbsrch_index ,
output tlbfill_en ,
output tlbwr_en ,
output tlbrd_en ,
output invtlb_en ,
output [ 9:0] invtlb_asid ,
output [18:0] invtlb_vpn ,
output [ 4:0] invtlb_op ,
//to perf_counter
output real_valid ,
output real_br_inst ,
output real_icache_miss ,
output real_dcache_miss ,
output real_mem_inst ,
output real_br_pre ,
output real_br_pre_error ,
//debug
output debug_ws_valid ,
input debug_break_point ,
//trace debug interface
output [31:0] debug_wb_pc ,
output [ 3:0] debug_wb_rf_wen ,
output [ 4:0] debug_wb_rf_wnum ,
output [31:0] debug_wb_rf_wdata ,
output [31:0] debug_wb_inst
// difftest
`ifdef DIFFTEST_EN
,
output ws_valid_diff ,
output ws_cnt_inst_diff ,
output [63:0] ws_timer_64_diff ,
output [ 7:0] ws_inst_ld_en_diff ,
output [31:0] ws_ld_paddr_diff ,
output [31:0] ws_ld_vaddr_diff ,
output [ 7:0] ws_inst_st_en_diff ,
output [31:0] ws_st_paddr_diff ,
output [31:0] ws_st_vaddr_diff ,
output [31:0] ws_st_data_diff ,
output ws_csr_rstat_en_diff ,
output [31:0] ws_csr_data_diff
`endif
);
reg ws_valid;
wire ws_ready_go;
wire flush_sign;
reg [`MS_TO_WS_BUS_WD -1:0] ms_to_ws_bus_r;
wire ws_gr_we;
wire ws_excp;
wire [15:0] ws_excp_num;
wire ws_ertn;
wire [ 4:0] ws_dest;
wire [31:0] ws_final_result;
wire [31:0] ws_pc;
wire [31:0] ws_csr_result;
wire [13:0] ws_csr_idx;
wire ws_csr_we;
wire ws_ll_w;
wire ws_sc_w;
wire [31:0] ws_error_va;
wire ws_tlbsrch;
wire ws_tlbfill;
wire ws_tlbwr;
wire ws_tlbrd;
wire ws_refetch;
wire ws_invtlb;
wire ws_icacop_op_en;
wire ws_br_inst;
wire ws_icache_miss;
wire ws_access_mem;
wire ws_dcache_miss;
wire ws_br_pre;
wire ws_br_pre_error;
wire ws_idle;
wire [31:0] ws_paddr;
wire ws_data_uc;
// difftest
wire [31:0] ws_inst ;
wire ws_cnt_inst ;
wire [63:0] ws_timer_64 ;
wire [ 7:0] ws_inst_ld_en ;
wire [31:0] ws_ld_paddr ;
wire [31:0] ws_ld_vaddr ;
wire [ 7:0] ws_inst_st_en ;
wire [31:0] ws_st_data ;
wire ws_csr_rstat_en ;
wire [31:0] ws_csr_data ;
assign {ws_csr_data , //492:461 for difftest
ws_csr_rstat_en, //460:460 for difftest
ws_st_data , //459:428 for difftest
ws_inst_st_en , //427:420 for difftest
ws_ld_vaddr , //419:388 for difftest
ws_ld_paddr , //387:356 for difftest
ws_inst_ld_en , //355:348 for difftest
ws_cnt_inst , //347:347 for difftest
ws_timer_64 , //346:283 for difftest
ws_inst , //282:251 for difftest
ws_data_uc , //250:250
ws_paddr , //249:218
ws_idle , //217:217
ws_br_pre_error, //216:216
ws_br_pre , //215:215
ws_dcache_miss , //214:214
ws_access_mem , //213:213
ws_icache_miss , //212:212
ws_br_inst , //211:211
ws_icacop_op_en, //210:210
invtlb_vpn , //209:191
invtlb_asid , //190:181
ws_invtlb , //180:180
ws_tlbrd , //179:179
ws_refetch , //178:178
ws_tlbfill , //177:177
ws_tlbwr , //176:176
tlbsrch_index , //175:171
tlbsrch_found , //170:170
ws_tlbsrch , //169:169
ws_error_va , //168:137
ws_sc_w , //136:136
ws_ll_w , //135:135
ws_excp_num , //134:119
ws_csr_we , //118:118
ws_csr_idx , //117:104
ws_csr_result , //103:72
ws_ertn , //71:71
ws_excp , //70:70
ws_gr_we , //69:69
ws_dest , //68:64
ws_final_result, //63:32
ws_pc //31:0
} = ms_to_ws_bus_r;
assign ws_to_ds_valid = ws_valid;
assign flush_sign = excp_flush || ertn_flush || refetch_flush || icacop_flush || idle_flush;
wire rf_we;
wire [4 :0] rf_waddr;
wire [31:0] rf_wdata;
assign ws_to_rf_bus = {rf_we , //37:37
rf_waddr, //36:32
rf_wdata //31:0
};
assign ws_ready_go = ~debug_break_point;
assign ws_allowin = !ws_valid || ws_ready_go;
always @(posedge clk) begin
if (reset || flush_sign) begin
ws_valid <= 1'b0;
end
else if (ws_allowin) begin
ws_valid <= ms_to_ws_valid;
end
if (ms_to_ws_valid && ws_allowin) begin
ms_to_ws_bus_r <= ms_to_ws_bus;
end
end
assign real_br_inst = ws_br_inst && real_valid;
assign real_icache_miss = ws_icache_miss && real_valid;
assign real_dcache_miss = ws_dcache_miss && real_valid;
assign real_mem_inst = ws_access_mem && real_valid;
assign real_br_pre = ws_br_pre && real_valid;
assign real_br_pre_error = ws_br_pre_error && real_valid;
assign real_valid = ws_valid & ~ws_excp; //ws valid and no exception
assign rf_we = ws_gr_we & real_valid;
assign rf_waddr = ws_dest;
assign rf_wdata = ws_final_result;
assign excp_flush = ws_excp & ws_valid;
assign ertn_flush = ws_ertn & real_valid;
assign refetch_flush = (ws_csr_we || ((ws_ll_w || ws_sc_w) && !ws_excp) || ws_refetch) && ws_valid;
assign csr_era = ws_pc;
assign csr_wr_en = ws_csr_we && real_valid;
assign wr_csr_addr = ws_csr_idx;
assign wr_csr_data = ws_csr_result;
assign icacop_flush = ws_icacop_op_en && ws_valid;
assign idle_flush = ws_idle && real_valid;
assign tlb_inst_stall = (ws_tlbsrch || ws_tlbrd) && ws_valid;
//tlb ins
assign {tlbsrch_en ,
tlbwr_en ,
tlbfill_en ,
tlbrd_en ,
invtlb_en } = {ws_tlbsrch ,
ws_tlbwr ,
ws_tlbfill ,
ws_tlbrd ,
ws_invtlb } & {5{real_valid}};
//llbit
assign ws_llbit_set = (ws_ll_w | ws_sc_w) & real_valid;
assign ws_llbit = ((ws_ll_w&&!ws_data_uc) & 1'b1) | (ws_sc_w & 1'b0);
assign ws_lladdr_set = ws_ll_w && !ws_data_uc && real_valid;
assign ws_lladdr = ws_paddr[31:4];
/*
excp_num[0] int
[1] adef
[2] tlbr |inst tlb exceptions
[3] pif |
[4] ppi |
[5] syscall
[6] brk
[7] ine
[8] ipe
[9] ale
[10] <null>
[11] tlbr |
[12] pme |data tlb exceptions
[13] ppi |
[14] pis |
[15] pil |
*/
//exception have piority, onle one exception is valid
assign {csr_ecode,
va_error,
bad_va,
csr_esubcode,
excp_tlbrefill,
excp_tlb,
excp_tlb_vppn} = ws_excp_num[ 0] ? {`ECODE_INT , 1'b0 , 32'b0 , 9'b0 , 1'b0 , 1'b0 , 19'b0 } :
ws_excp_num[ 1] ? {`ECODE_ADEF, ws_valid, ws_pc , `ESUBCODE_ADEF, 1'b0 , 1'b0 , 19'b0 } :
ws_excp_num[ 2] ? {`ECODE_TLBR, ws_valid, ws_pc , 9'b0 , ws_valid, ws_valid, ws_pc[31:13] } :
ws_excp_num[ 3] ? {`ECODE_PIF , ws_valid, ws_pc , 9'b0 , 1'b0 , ws_valid, ws_pc[31:13] } :
ws_excp_num[ 4] ? {`ECODE_PPI , ws_valid, ws_pc , 9'b0 , 1'b0 , ws_valid, ws_pc[31:13] } :
ws_excp_num[ 5] ? {`ECODE_SYS , 1'b0 , 32'b0 , 9'b0 , 1'b0 , 1'b0 , 19'b0 } :
ws_excp_num[ 6] ? {`ECODE_BRK , 1'b0 , 32'b0 , 9'b0 , 1'b0 , 1'b0 , 19'b0 } :
ws_excp_num[ 7] ? {`ECODE_INE , 1'b0 , 32'b0 , 9'b0 , 1'b0 , 1'b0 , 19'b0 } :
ws_excp_num[ 8] ? {`ECODE_IPE , 1'b0 , 32'b0 , 9'b0 , 1'b0 , 1'b0 , 19'b0 } : //close ipe excp now
ws_excp_num[ 9] ? {`ECODE_ALE , ws_valid, ws_error_va, 9'b0 , 1'b0 , 1'b0 , 19'b0 } :
ws_excp_num[11] ? {`ECODE_TLBR, ws_valid, ws_error_va, 9'b0 , ws_valid, ws_valid, ws_error_va[31:13]} :
ws_excp_num[12] ? {`ECODE_PME , ws_valid, ws_error_va, 9'b0 , 1'b0 , ws_valid, ws_error_va[31:13]} :
ws_excp_num[13] ? {`ECODE_PPI , ws_valid, ws_error_va, 9'b0 , 1'b0 , ws_valid, ws_error_va[31:13]} :
ws_excp_num[14] ? {`ECODE_PIS , ws_valid, ws_error_va, 9'b0 , 1'b0 , ws_valid, ws_error_va[31:13]} :
ws_excp_num[15] ? {`ECODE_PIL , ws_valid, ws_error_va, 9'b0 , 1'b0 , ws_valid, ws_error_va[31:13]} :
69'b0;
//invtlb ins
assign invtlb_op = ws_dest;
// debug info generate
assign debug_wb_pc = ws_pc;
assign debug_wb_rf_wen = {4{rf_we}};
assign debug_wb_rf_wnum = ws_dest;
assign debug_wb_rf_wdata = ws_final_result;
assign debug_wb_inst = ws_inst;
assign debug_ws_valid = ws_valid;
`ifdef DIFFTEST_EN
assign ws_valid_diff = real_valid ;
assign ws_timer_64_diff = ws_timer_64 ;
assign ws_cnt_inst_diff = ws_cnt_inst ;
assign ws_inst_ld_en_diff = ws_inst_ld_en ;
assign ws_ld_paddr_diff = ws_ld_paddr ;
assign ws_ld_vaddr_diff = ws_ld_vaddr ;
assign ws_inst_st_en_diff = ws_inst_st_en ;
assign ws_st_paddr_diff = ws_ld_paddr_diff ;
assign ws_st_vaddr_diff = ws_ld_vaddr_diff ;
assign ws_st_data_diff = ws_st_data ;
assign ws_csr_rstat_en_diff = ws_csr_rstat_en ;
assign ws_csr_data_diff = ws_csr_data ;
`endif
endmodule