# Design of Modified Vertical-Horizontal Binary Common SubExpression Elimination Algorithm Based On CBL for FIR Filter Application 

M.Dasharatha ${ }^{1}$, B.Rajendra Naik ${ }^{2}$ And N.S.S.Reddy ${ }^{3}$<br>${ }^{1}$ Department of ECE, UCE Osmania Univrsity, Hyderabad, India<br>${ }^{2}$ Department of ECE, UCE Osmania Univrsity, Hyderabad, India<br>${ }^{3}$ Department of ECE, VCE Osmania Univrsity, Hyderabad, India Corresponding Author: M.Dasharatha


#### Abstract

This paper proposes efficient constant multiplier architecture based on vertical-horizontal binary common sub-expression elimination (VHBCSE) algorithm based on carry select adder using common Boolean logic and modified full adder for planning a reconfigurable finite impulse response(FIR) channel whose coefficients can progressively change continuously. To plan a proficient reconfigurable FIR channel, as indicated by the proposed modified VHBCSE calculation, 2-bit paired regular sub-articulation end (BCSE) calculation has been connected vertically crosswise over contiguous coefficients on the 2-D space of the coefficient lattice at first, taken after by applying variable-piece BCSE calculation evenly inside every coefficient. This method is fit for lessening the normal likelihood of utilization or the exchanging action of the multiplier piece adders two existing 2-bit and 3-bit BCSE calculations individually. ASIC execution aftereffects of FIR channels utilizing this multiplier demonstrate that the proposed VHBCSE calculation is additionally fruitful in decreasing the normal power utilization. As respects the execution of FIR channel, changes of area delay product (ADP) by $15.51 \%$ and power delay product (PDP) $25.39 \%$ for the proposed VHBCSE calculation have been accomplished over those of the prior multiple constant multiplication (MCM) calculations.


Keywords: Index Terms-VHBCSE algorithm, MCM, FIR filter, VLSI design.
Date of Submission 20-01-2018
Date of acceptance: 17-02-2018

## I. Introduction

FIR FILTER has wide application as the key part in any advanced flag preparing, picture and video handling, remote correspondence, and biomedical flag handling frameworks. Also, frameworks like Software Defined Radio (SDR) [1] and multi-standard video codec [2] require a reconfigurable FIR channel with progressively programmable channel coefficients, addition components and lengths which may shift as indicated by the determination of various models in a convenient figuring stage. Noteworthy relevance of a productive reconfigurable FIR channel spurs the framework originator to build up the chip with minimal effort, power, and territory alongside the ability to work at rapid.
In any FIR channel, the multiplier is the significant limitation which characterizes the execution of the coveted channel. Hence, finished the recent decades, outline of an effective equipment design for settled point FIR channel has been considered as the major investigate center as announced in distributed written works [3]- [10]. In FIR channel, the augmentation operation is performed between one specific variable (the information) and numerous constants (the coefficients) and known as the different consistent duplication (MCM).

The calculations proposed before to execute this MCM for a proficient FIR channel configuration can be ordered in two principle gatherings: 1) chart based calculations and 2) normal sub-articulation end (CSE) calculations [08]- [10]. Most of these diagram based or CSE calculations introduced before are used to acquire effective FIR channel equipment engineering by running the calculations on a specific (settled) arrangement of coefficients for quite a while (two or three hours to days) on an exceedingly productive processing stage (like utilizing 1-20 number of 3.2 GHz PCs in parallel mode as specified in [7]). In any case, FIR channel usage utilizing compelling MCM outline by running these calculations on a settled arrangement of coefficients is not reasonable for the application like SDR framework due to the accompanying two reasons: 1) coefficient of the channels in SDR framework are powerfully programmable in view of prerequisite of various benchmarks and 2) exceptionally computationally effective stage required for those calculations is unreasonably expensive in SDR framework.

A few procedures have been presented for productive reconfigurable steady multiplier outline for any application where the channel's coefficients are changing continuously e.g. multi-standard advanced up/down converter. Paired normal sub-articulation disposal (BCSE) calculation is one of those methods, which presents the idea of disposing of the normal sub-articulation in paired shape for planning an proficient consistent multiplier, and is in this way pertinent for reconfigurable FIR channels with low intricacy. Be that as it may, the decision of the length of the paired normal sub-articulations (BCSs) makes the plan wasteful by expanding the viper step and the equipment cost. The proficiency as far as speed, power, and zone of the steady multiplier has been expanded in the work exhibited in [07] while outlining one reconfigurable FIR channel for multi-standard DUC by picking 2 -bit long BCS wisely.

A few systems have been presented for productive reconfigurable steady multiplier outline for any application where the channel's coefficients are changing progressively e.g. multi-standard computerized up/down converter. Twofold normal sub-articulation end (BCSE) calculation is one of those systems, which presents the idea of dispensing with the basic sub-articulation in twofold shape for planning an effective consistent multiplier, and is in this manner material for reconfigurable FIR channels with low multifaceted nature [12]. Be that as it may, the decision of the length of the parallel normal sub-articulations (BCSs) in [12] makes the plan wasteful by expanding the viper step and the equipment cost. The effectiveness as far as speed, power, and territory of the steady multiplier has been expanded in the work introduced in [10] while planning one reconfigurable FIR channel for multi-standard DUC by picking 2-bit long BCS prudently.
Decision of the BCS of settled length (3-bit or 2-bit) in the prior proposed BCSE calculation based reconfigurable FIR channel plans leaves an extension to advance the outlined channel by considering the BCS over the neighboring coefficients and inside a solitary coefficient. The tradition considered for speaking to the info and the coefficient of the prior planned FIR channel as marked extent organize likewise gives an extension to adjust the information portrayal to marked decimal number for more extensive materialness of the proposed FIR channel in any frameworks. On concentrate the previously mentioned written works, it has been figured it out that the advancement of a proficient reconfigurable steady multiplier is especially required for its materialness in any reconfigurable framework.

## II. Architecture Of Theproposed Vhbsse Algorithm Based Constant Multiplier

The information stream graph of the proposed vertical-level BCSE calculation based steady multiplier (CM) plan is appeared in Fig. 5. The composed multiplier thinks about the length of the info (Xin) and coefficient $(\mathrm{H})$ as 16 -bit and 17 -bit separately while the yield is thought to be 16 -bit long. In this, the examined inputs are put away in the enlist first and at that point the coefficients are put away straightforwardly in the LUTs. Usefulness alongside equipment design of various squares of the planned VHBCSE based multiplier are clarified beneath in subtle elements


Fig1: Data flow diagram of the CM using VHBCSE algorithm.

1) Sign Conversion Block: Sign change square is required to help the marked decimal arrangement information portrayal for both the information and the coefficient. The engineering of the sign change square is appeared in Fig. 6. There is one 1's complementer circuit to create the altered form of the 16-bit (barring MSB) coefficient. One 16-bit 2:1 multiplexer produces the multiplexed coefficients relying upon the estimation of the most huge piece (MSB) of the coefficient. For negative estimation of the first coefficient, the multiplexed coefficient will be in the modified shape; else it will be as it may be.


Fig2: Hardware architecture of the Sign Conversion Block.
2) Partial Product Generator (PPG): In BCSE technique, move furthermore, include based procedure has been utilized to create the fractional item which will be summed up in the accompanying advances/layers for creating the last duplication result. Decision of the size of the BCS characterizes the quantity of halfway items. In the proposed calculation in the layer-1, 2-bit twofold regular sub-articulations (BCSs) running from " 00 " to "11" have been considered, which will deliver 4 halfway items. Yet, inside four of these BCSs, a solitary snake (A0) will be required to create the halfway item just for the example " 11 "; the rest will be created by hardwired moving. For the coefficient of 16 -bit length, 8 incomplete results of $17,15,13,11,9,7,5$, and 3 bits (P8-P1) will be created by right moving the principal halfway item (P8) by $0,2,4,6,8,10,12$, and 14 bits separately. This procedure helps in lessening the multiplexer's size which is utilized alongside select the best possible halfway item relying upon the coefficient's twofold esteem.


Fig3: Block diagram of the Partial Product Generator Unit.


Fig4: Block diagram of the control logic generator unit.
3) Control Logic (CL) Generator: Control rationale generator piece takes the multiplexed coefficient ( $\mathrm{Hm}[15: 0]$ ) as its info furthermore, bunches it into one of 4 -bit each ( $\mathrm{Hm}[15: 12], \mathrm{Hm}[11: 8], \mathrm{Hm}[7: 4]$, and $\operatorname{Hm}[3: 0])$ and another of 8 -bit each ( $\mathrm{Hm}[15: 8], \mathrm{Hm}[7: 0]$ ). As per the calculation said in Section IV, the CL generator piece will create 7 control signals depending on the balance check for 7 distinct cases. The design for the control flag generator square is appeared. The control motion for 8 -bit correspondence check supposedly is delivered through the control signals produced from the 4-bit fairness check.


Fig5: Architectural details of the controlled addition at layer-2 block.


Fig6: Hardware architecture of the controlled addition at layer-3.
4) Multiplexers Unit: The multiplexer unit is utilized to choose the suitable information produced from the PPG unit depending on the coefficient's twofold esteem. At layer-1, eight $4: 1$ multiplexers are required to deliver the halfway items agreeing to the 2-bit BCSE calculation connected vertically on the MAT. The widths of these 8 multiplexers are $17,15,13,11,9,7,5$, and 3 -bit each rather than 16 -bit for all, which would decrease the equipment furthermore, control utilization.
5) Controlled Addition at Layer-2: The incomplete items (PP) produced from eight gatherings of 2-bit BCSs are included for the last duplication comes about which have been performed in three layers. As per the BCSE calculation [12] proposed before, layer-2 requires four expansion (A1-A4) operations to whole up the eight PPs. Rather than coordinate expansion of these PPs, the controlled expansion operations are performed at layer 2 agreeing to the proposed VHBCSE calculation. These adders (A1-A4) are controlled relying upon the control signals (C1-C6), which were created in view of 4-bit BCSE from the control flag generator piece. The design of this square is appeared in Fig. 9, which uncovers that the spread postponement will be the most extreme between the ways which has been utilized to create AS2, AS3, AS4.


Fig7: Constant Multiplier architecture
6) Controlled Addition at Layer-3: The four multiplexed aggregates (AS1, AS2, AS3 and AS4) created from layer- 2 are presently summed up in layer-3. In our calculation, controlled increments are performed, rather than coordinate expansion of these four wholes as appeared in Fig. 10. Thus, this expansion (A6) is controlled by the control flag (C7) which has been created in view of 8 -bit BCSE from the CS generator square. it is finished up that the proliferation deferral will be 7) Final Addition on Layer-4: This square plays out the expansion operation between the two entireties (AS5-AS6) created by layer-3 to at long last create the increase result between the input and the coefficient. The piece graph of the general consistent increase is appeared.

## III. Proposed System

In this proposed system we are designing the 16 bit adder using common Boolean logic. In this common Boolean logic we are using a modified full adder for less time delay.
The proposed adjusted full adder circuit as appeared comprises of two $2: 1 \mathrm{MUX}$ and a XOR gate. In the proposed structure, one XOR gate in the ordinary full adder is supplanted by a multiplexer square so that the basic way delay is limited. As can be seen from, the basic way delay is utilization of the full snake can be decreased. The proposed full adder is connected into exhibit multiplier decrease stage to approve the adequacy. In exhibit structure the fractional items is separated into specific levels. In each level, at whatever point there are three bits, full snake must be utilized. Out of the three data sources, one information and its supplement is given as contributions to the principal multiplexer. The other two data sources are given to XOR entryway, the yield of which will go about as a select line to both the multiplexers. The contributions of the second multiplexer are, the bits other than the convey bit. This one of a kind method for planning prompts the lessening of the exchanging action, which thus diminishes the power. Likewise, the basic way delay is likewise lessened contrasted with the existing outlines examined in writing, which prompts lessening in deferral and subsequently expanding the speed. Operation of the proposed full snake can be clarified as takes after:
a) When both B and C are zero or one, aggregate $=\mathrm{A}$;
b) When both of B or C is one and another is zero, sum=A;
c) When both B and C are zero or one, carry= B ;

At the point when both of B or C is one and another is zero, carry $=\mathrm{A}$;


Fig8: Proposed Full Adder
In proposed architecture, an area-efficient carry select adder by sharing the common Boolean logic term to remove the duplicated adder cells in the conventional carry select adder is shown in this way, it saves many transistor counts and achieves a low power. Through analyzing the truth table of a single bit full adder, to find out the output of summation signal as carry-in signal is logic ' 0 ' is the inverse signal of itself as carry-in signal is logic ' 1 '. By sharing the common Boolean logic term in summation generation, a proposed carry select adder design. To share the common Boolean logic term, it only needs to implement one OR gate with one INV gate to generate the carry signal and summation signal pair. Once the carry-in signal is ready, then select the correct carry-out output according to the logic state of carry-in signal.


Fig9: Proposed 16-bit SQRT CSLA

## IV. Results And Discussions

The modified VHBCSE algorithm based constant multiplier architecture have been implemented using Verilog Hardware description language and simulated using Xilinx ISE simulator. The synthesis results obtained are targeted for SPARTAN 3E XC3S250E FPGA. Table1 Shows that $15.51 \%$ reduce in ADP and 25.39\% Reduce in PDP.


Fig10: RTL Schematic for the proposed design


Fig11: Simulation output for the proposed design

Table 1: Comparison of ADP and PDP of existing and proposed

| Design | Delay | ADP | PDP |
| :--- | :--- | :--- | :--- |
| Existing | 22.290 | 3.655 | 17.386 |
| Proposed | 17.776 | 3.164 | 13.865 |

Table1::Comparison of Delay for existed and proposed Systems

## Area Delay Product



Fig 12: ADP for proposed and existing system

## Power Delay Product



Fig 13: Power Delay Product for proposed and existing system

## V. Conclusions

With a view to actualizing a proficient settled point reconfigurable FIR channel, this paper presents one new vertical-level BCSE calculation which evacuates the underlying normal sub-articulations (CSs) by applying 2-bit BCSE vertically. Encourage end of the CSs has been performed through finding the CSs exhibit inside the coefficients by applying BCSEs of various lengths on a level plane to various layers of the move and include based steady multiplier engineering. It has been demonstrated that the proposed calculation effectively lessens the normal exchanging exercises of the multiplier square adder contrasted with those of 2-bit and 3-bit BCSEs (settled piece vertical BCSE) separately. Lessening of exchanging exercises amid equipment usage of various FIR channels brings about bringing down the normal power utilization by $32 \%$ and $52 \%$ relative to these two calculations individually. Usage comes about uncover that there are impressive measure of energy reserve funds for higher request channel as a substantial number of matches can be found for more number of coefficients. The proposed modified VHBCSE calculation sets up enhancements of productivity of $15.51 \%$ of area delay product (ADP) and $25.39 \%$ in power delay product (PDP) when contrasted with those of before proposed VHBCSE calculation based FIR channel. Augmenting the proficiency furthermore, supporting the marked decimal information portrayal for both the info and coefficient make the proposed steady multiplier in view of VHBCSE calculation more reasonable for next age productive frameworks like programming characterized radio.

## References

[1]. S. J. Darak, S. K. P. Gopi, V. A. Prasad, and E. Lai, "Low-complexity reconfigurable fast filter bank for multi-standard wireless receivers," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 5, pp. 1202-1206, May 2014.
[2]. J. L. Nunez-Yanez, T. Spiteri, and G. Vafiadis, "Multi-standard reconfigurable motion estimation processor for hybrid video codecs," IET Comput. Digit. Tech., vol. 5, no. 2, pp. 73-85, Mar. 2011.
[3]. H. Samueli, "An improved search algorithm for the design of multiplier less FIR filters with power-of-two coefficients," IEEE Trans. Circuits Syst., vol. 36, no. 7, pp. 1044-1047, Jul. 1989.
[4]. A. G. Dempster and M. D. Macloed, "Use of minimum-adder multiplier blocks in FIR digital filters," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 42, no. 9, pp. 569-577, Sep. 1995.
[5]. C. Y. Yao, H. H. Chen, T. F. Lin, C. J. Chien, and C. T. Hsu, "A novel common subexpression elimination method for synthesizing fixed-point FIR filters," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 11, pp. 2215-2221, Nov. 2004.
[6]. M. Aktan, A. Yurdakul, and G. Dundar, "An algorithm for the design of low-power hardware-efficient FIR filters," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, pp. 1536-1545, Jul. 2008.
[7]. C. Y. Yao, W. C. Hsia, and Y. H. Ho, "Designing hardware-efficient fixed-point FIR filters in an expanding subexpression space," IEEE Trans. Circuits and Systems I, Reg. Papers, vol. 61, no. 1, pp. 202-212, Jan. 2014.
[8]. B. Rashidi, "High performance and low-power finite impulse response filter based on ring topology with modified retiming serial multiplier on FPGA," IET Signal Process., vol. 7, no. 8, pp. 743-753, Oct. 2013.
[9]. H. Choo, K. Muhammad, and K. Roy, "Complexity reduction of digital filter using shift inclusive differential coefficients," IEEE Tans. Signal Process., vol. 52, no. 6, pp. 1760-1772, Jun. 2004.
[10]. J. H. Choi, N. Banerjee, and K. Roy, -Variation-aware low-power synthesis methodology for fixed point FIR filters,l IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 28, no. 1, pp. 87-97, Jan. 2009
[11]. P. K. Meher, -New approach to look-up-table design and memory based realization of FIR digital filter,\| IEEE Trans. Circuits Syst. I, Reg.
[12]. P. K. Meher, -New approach to look-up-table design and memory based realization of FIR digital filter,\| IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592-603, Mar. 2010. Papers, vol. 57, no. 3, pp. 592-603, Mar. 2010.

