# Reconfigurable Distributed Arithmetic Based Adaptive Noise Canceller Using Modified NLMS Algorithm

Dr. Rajesh Mehra<sup>1</sup>, Lalita Sharma<sup>2</sup>

<sup>1, 2</sup> Department of Electronics & Communication Engineering, NITTTR Chandigarh, Punjab University, India.

**Abstract:** This paper presents an efficient design and implementation of low area, high speed Adaptive filter based on Distributed Arithmetic (DA) Scheme. An enhanced NLMS algorithm has been proposed for the adaptive noise cancellation filter design. The computation speed of the proposed NLMS system is relatively high due to preallocation of memory for variables in enhanced Normalized LMS algorithm. The proposed design is successfully implemented using Matlab Code and Xilinx ISE Design Suit on Spartan 3 based XC 35400 and Spartan 3E based Xc3500e FPGA device. The synthesis report shows a considerable decrease in device utilization percentage and increase in overall speed than the existing design. For 20 tap proposed filter there is 43% reduction in number of slices, 59% reduction in number of flip flops, 24% reduction in number of LUTs used, whereas 54% improvement has been achieved in maximum frequency and 35.14% improvement in minimum period. Whereas for 10 coefficient filter there is 21% increase in maximum frequency and 16.46% decrease in minimum period.

Keywords: Adaptive Filters, Distributed Arithmetic, FPGA, NLMS Algorithm, Noise Cancelation.

# I. Introduction

In this era of extensive telecommunication systems, the efficient signal processing is one of the biggest challenges as the signals suffer interference and noise caused by various transmission mediums. To improve the quality of communication, an effective noise cancellation method is required [1]. Noise Cancellation refers to the process of optimal filtering that includes estimation of the noise by filtering the reference signal and deducting this estimated noise from the primary input which contains both signal and noise. In adaptive filtering process when an input signal containing noise is applied to the filter, a negative feedback is applied which depends on the noise in the input signal by adjusting weights values which cancels out the noise from input signal [2].

Over the past two decades, digital signal processors have been changed revolutionary to improve speed and efficiency of communication systems. Many advancements have been made in DSPsover the past three decades in speed improvement, area and power consumption. The researchers have put a great effort in crafting efficient Digital Signal Processing (DSP) functions architecture such as FIR filters, which are most commonly used in various telecommunication applications. Adaptive filter is one of the effective solution to filter out noise less signals in a communication system. Adaptive filter changes filter coefficients with time to adapt to the dynamic input signal environment [3].

Fig. 1 represents a standard Adaptive Noise Cancellation process in which there are two inputs: one is primary signal and other is reference signal. The primary signal  $x_n$  is corrupted by the noise  $n_n$  added by means of communication mediums or external environment. The reference signal  $nr_n$  is second input which is similar to or correlated with the noise signal  $n_n$ . The reference noise passes through an Adaptive Filter to produce an output  $nf_n$  which nearly resembles noise  $n_n$  present in the primary input [4]. This estimated noise  $(nf_n)$  is subtracted from the primary input signal  $(x_n + n_n)$  to produce the estimated error  $e_n$  and output  $y_n$  which is similar to the signal  $x_n$ .



Fig. 1: Standard Adaptive Noise Canceller Organization

There has been a tendency to implement DSP functions in Field Programmable Gate Arrays (FPGAs) for last few years, which provides a balanced solution in terms of area and speed of communicating device in comparison with traditional devices [5]. FPGAs also offer an attractive solution that balances high flexibility and cost of a device. Previously, the design methods were mainly focused on multiplier based architectures also known as multiply and Accumulate (MAC) blocks constituted in several DSP functions. This requires an appreciable number of multipliers and hence a considerable amount of hardware.But now a days, the multiplier less Distributed Arithmetic (DA) based technique has been considered a very reliable approach due to its high throughput and regularity, due to which a cost effective and time efficient devices can be obtained [6].

#### II. **Background Concepts**

In this section DA based Adaptive Filters and Adaptive filtering algorithms are presented that are best suited for hardware implementation.DA algorithm becomes quiet fast, when the number of elements in a vector is same as the word size. The beauty of the technique is that the DA algorithm replaces the multiplications by ROM Look up tables [7]. This is an efficient way for implementing FPGAs.

# **2.1 Distributed Arithmetic**

Distributed Arithmetic was first introduced by Croisier et al. and further developed by Peled and Lui [8]. It is based on a multiplier-less implementation of FIR filters through a bit-serial computation using all possible combination sums of the filter coefficients [9].



Fig. 2. Basic DA based FIR filter

Fig. 2 presents the Distributed Arithmetic (DA) implementation for four-tapFIR filter where  $x_n$  denotes samples of coming input signal. These samples are stored in the shift-registers in a manner that the latest sample is stored on the top most register and the oldest is stored in the last register [10]. Lookup Table (LUT) contains the partial product of LeastSignificantBits (LSBs) taken fromeachofthe shiftregistersformtheaddress lines. The DA architecture is based on storing all the possible combinations of the coefficients  $w_n$  in lookup table [11].

# 2.2 Least Mean Square Algorithm

LMS algorithm was first proposed in year 1960 by Widrow and Holf. This algorithm is used to minimize Mean Square Error (MSE) by adjusting weight coefficients for each sample of coming input sample. Most of the noise cancellation applications use this algorithm. This algorithm can be understood in two phase: Filtering Phase & Weight Updation Phase [12]. In filtering phase unwanted signal isfiltered by using close estimation of unwanted signal and initial weight coefficients. The output of this phase nearly resembles the desired signal. In weight updation phase, weight coefficients are updated on the basis of error feedback from the previous filtering phase [13]. The updated weight is now used for the next filtering process. Equation 1 represents the weight updation equation for LMS algorithm. (1)

 $w_n = w_{n-1} + \mu e_n x_n$ 

In the above equation  $w_n$  denotes updated weight coefficient and  $w_{n-1}$  denotes previous weight. Step size is denoted by  $\mu$ ,  $x_n$  is input signal sample and  $e_n$  is estimated error signal. For very small values of  $\mu$  filter may become unstable due to more time of convergence. The output of adaptive process is given as per the equation below:

 $y_{n+1} = w_{n+1} * x_{n+1}$ 

(2)

# 2.3 Normalized Least Mean Square Algorithm

| Least Mean Square Algorithm has limitation that it may become unstable as signal power changes. To                 |
|--------------------------------------------------------------------------------------------------------------------|
| overcome this problem Normalized Least Mean Square (NLMS) algorithm was introduced. The instability in the         |
| LMS algorithm is caused by the very small values of Step Size $(\mu)$ [14]. But in case of NLMS the input power is |
| normalized to impose very less effect on the weight updation process. The weight updation equation for NLMS        |
| algorithm is given as equation 4 in which Cn denotes the normalization constant which can be calculated as         |
| equation 3:                                                                                                        |
| (2)                                                                                                                |

 $Cn = x_n^2 + .0001$ (3)  $w_n = w_{n-1} + \mu/Cn^* e_n^* x_n$ (4)

The normalized form of LMS algorithm provides more stability as well as high rate of convergence for the adaptive filtering process [15].

### III. Proposed Design

In this section proposed methodology and design for DA based Adaptive Filter is presented, which uses enhanced NLMS algorithm. For this purpose traditional NLMS algorithm is replaced by proposed NLMS algorithm for adaptive filtering. In the proposed algorithm we have introduced concept of memory preallocation for variables which results in increased computational speed for the filtering system. The following pseudo code explains the proposed adaptive methodology. The notations used in the algorithm are described as under:

- $x_n$  : Input Signal
- h[p,n] : Convolution Matrix generated using  $x_n$
- *Dn* : Desired Output
- w[p,n] : Weight Matrix
- $e_n$  : Estimated Error
- $y_n$  : Filter Output
- $\mu$  : Step Size
- *z* : Impulse Response
- $C_n$  : Normalization Constant

$$\begin{split} h[p,n] &= conv(x_n, z) \\ C_n &= h[n,:]*h[n,:]' + 0.0001 \\ * Determining NormalizationConstant using convolutionmatrix*/ \\ e_n &= d_n - w[n,:]*h[n,:]' \\ * Error calculation*/ \\ w[n,:] &= w[n-1,:] + \mu/C_n*e_n*conj(h[n,:]) \\ * Filter Output Calculation*/ \end{split}$$

The final output of the adaptive system is calculated by subtracting  $y_n$  from the input signal containing noise. We have implemented the proposed algorithm on FPGA using Distributed Arithmetic algorithm. Results of the simulation of algorithm and synthesis report of hardware implementation of the same is discussed in next section.

# IV. Results And Discussion

The proposed algorithm of Normalized LMS algorithm is initially simulated using Matlab code and afterwards Distributed Arithmetic based implementation on a target FPGA of the same is done by converting M code to VHDL code. Design was exposed to variable step size and filter order to test the adaptability and stability of the proposed technique. To observe the behavior of proposed design, the input signal  $x_n$  with sampling frequency f<sub>s</sub>=48000 Hz is taken. Initially we observed the Magnitude response and Impulse response for LMS and NLMS algorithms which are presented in figure 3 to 6.







Fig. 5 Impulse response LMS &NLMS for 10 coefficients.

Samples

Similarly the Magnitude response and Impulse response for 20 coefficient, LMS and NLMS algorithms are observed.



After Matlab simulations the VHDL code of the proposed system is simulated on Xilinx ISE Simulator 12.2 for input and output streams of 16 bit. Figure 7 and 8 presents the simulated wave forms for 10 and 20 coefficients.

| lame              | Value                                   | minhiri | 100 ns       | 200 ns | 300 ns | 400 ns        | 500 ns      | 600 ns    | 1700 ns         |
|-------------------|-----------------------------------------|---------|--------------|--------|--------|---------------|-------------|-----------|-----------------|
| 🗓 cik             | 1                                       |         |              |        |        |               |             |           |                 |
| 🕼 clk_enable      | 1                                       |         |              |        |        |               |             |           |                 |
| 1 reset           | 1                                       |         |              |        |        |               |             |           |                 |
| 📲 filter_in[15:0] | 4000                                    | 40)     |              |        |        |               | 0000        |           |                 |
| filter_out[15:0]  | 000000000000000000000000000000000000000 |         | 000000000000 | 0000   | X 000  | 0111000110101 | X 000110011 | Q111110 X | 000110001111110 |
| 🛯 clk_high        | 5000 ps                                 |         |              |        |        |               | 5000 ps     |           |                 |
| ] cik_low         | 5000 ps                                 |         |              |        |        |               | 5000 ps     |           |                 |
| 🖟 clk_period      | 10000 ps                                |         |              |        |        |               | 10000 ps    |           |                 |
| 🔓 cik_hold        | 2000 ps                                 |         |              |        |        |               | 2000 ps     |           |                 |

Fig. 7 Simulation waveform NLMS Adaptive filter 10 coefficients.

| ame                | Value          | بمستمسي | 100 ns        | 200 ns | 300 ns | 400 ns        | 500 ns    | 600 ns  | 700 ns         |
|--------------------|----------------|---------|---------------|--------|--------|---------------|-----------|---------|----------------|
| 🗓 dk               | 0              |         |               |        |        |               |           |         |                |
| 🗓 clk_enable       | 1              |         |               |        |        |               |           |         |                |
| 🕘 reset            | 0              |         |               |        |        |               |           |         |                |
| 📲 filter_in[15:0]  | 0000           | 4))     |               |        |        |               | 0000      |         |                |
| 👯 filter_out[15:0] | 00011000111111 |         | 0000000000000 | 000    | χ 000  | 0111000110101 | 000110011 | 0111110 | 00011000111111 |
| 🖟 clk_high         | 5000 ps        |         |               |        |        |               | 5000 ps   |         |                |
| 1 clk_low          | 5000 ps        |         |               |        |        |               | 5000 ps   |         |                |
| le clk_period      | 10000 ps       |         |               |        |        |               | 10000 ps  |         |                |
| 🖟 cik_hold         | 2000 ps        |         |               |        |        |               | 2000 ps   |         |                |

Fig. 8 Simulation waveform NLMS Adaptive filter 20 coefficients.

The VHDL description of the proposed algorithm is simulated and implemented on a Xilinx Spartan 3 XC 35400 and Spartan 3E FPGA device using DA algorithm by efficiently utilizing LUTs of FPGA target device. We have tabulated details of resource utilized by the design and compared it with one of previous design referred as design [3].

| Table 1 Resource Utilization and Speed by using Spartan 3E based AC3500e FPGA. |                                |                                |           |  |  |  |  |
|--------------------------------------------------------------------------------|--------------------------------|--------------------------------|-----------|--|--|--|--|
| Parameter                                                                      | Utilization for 10 coefficient | Utilization for 20 coefficient | Available |  |  |  |  |
| No. of Slices                                                                  | 221                            | 346                            | 4656      |  |  |  |  |
| No. of slice Flip Flops                                                        | 165                            | 220                            | 9312      |  |  |  |  |
| No. of LUTs                                                                    | 413                            | 653                            | 9312      |  |  |  |  |
| Number of bonded IOBs                                                          | 35                             | 35                             | 232       |  |  |  |  |
| Maximum Freq. (MHz)                                                            | 110.738                        | 127.240                        |           |  |  |  |  |
| Minimum Period (ns)                                                            | 9.030                          | 7.859                          |           |  |  |  |  |

| Table 1 Resource Utilization | on and Speed by | using Spartan 3  | E based Xc3500e FPGA. |
|------------------------------|-----------------|------------------|-----------------------|
| Tuble I Resource Cumzun      | m and opeca by  | using opur tun o |                       |

# Table 2 Resource Utilization and Speed by using Spartan 3 based XC 35400 FPGA.

| Parameter               | Utilization for 10 coefficient | Utilization for 20 coefficient | Available |
|-------------------------|--------------------------------|--------------------------------|-----------|
| No. of Slices           | 222                            | 341                            | 3584      |
| No. of slice Flip Flops | 155                            | 212                            | 7168      |
| No. of LUTs             | 414                            | 642                            | 7168      |
| Number of bonded IOBs   | 35                             | 35                             | 141       |
| Maximum Freq. (MHz)     | 107.616                        | 97.207                         |           |
| Minimum Period (ns)     | 9.292                          | 10.287                         |           |

On observing table 1 and 2 it is clear that the proposed design is area as well as speed efficient for variable filter orders. Furthermore the performance comparison for the proposed design for adaptive filter is presented in Table 3.We have compared the proposed system with a previous design of adaptive filter presented in design [3]. From table 3 we can observe that the proposed filter of 10 and 20 coefficients can be operated at an estimated frequency 107.616 MHz and 97.207 MHz as compared to 88.89 MHz and 63.04 in existing design [3], with minimum period of 9.292 and 10.287 as compared to existing period of 11.124 and 15.862 respectively by using SPARTAN 3 based XC 35400.

| Tuble e Tresour ce comparison for emseing and proposed design |                 |          |                |                 |           |  |  |  |
|---------------------------------------------------------------|-----------------|----------|----------------|-----------------|-----------|--|--|--|
|                                                               | 10 coefficients |          | 20 coefficient | 20 coefficients |           |  |  |  |
| Logic utilization & speed                                     | Design [3]      | Proposed | Design [3]     | Proposed        | Available |  |  |  |
| No. of Slices                                                 | 276             | 222      | 603            | 341             | 3584      |  |  |  |
| No. of Flip Flops                                             | 275             | 155      | 519            | 212             | 7168      |  |  |  |
| No. of LUTs                                                   | 370             | 414      | 854            | 642             | 7168      |  |  |  |
| No. of Multipliers                                            | 9               | 0        | 16             | 0               |           |  |  |  |
| Maximum Freq. (MHz)                                           | 88.89           | 107.616  | 63.04          | 97.207          |           |  |  |  |
| Minimum Period (ns)                                           | 11.124          | 9.292    | 15.862         | 10.287          |           |  |  |  |





**Fig. 9 Comparative Analysis** 

The developed multiplier less adaptive filter has consumed less number of slices, flip flops and LUTs as compared to existing design. Also in proposed design there is no multiplier used whereas 9 and 16 multipliers are being used by existing design for 10 and 20 coefficient filters respectively.

# V. Conclusion

Distributed Arithmetic based Adaptive Filter is implemented with modified NLMS algorithm which uses concept of memory preallocation for variables like input and output signals. We used Matlab for simulation and testing is of the proposed system. Afterwards design is implemented on target FPGA and analysis is done on the basis of number of slices, flip flops, LUTs, IOBs, maximum frequency and minimum period. The proposed filter has been implemented on Spartan 3 based XC 35400 and Spartan 3E based Xc3500e FPGA. Also the comparative analysis has been done with the existing design and it has been observed that DA based proposed design consume less area and provides high speed as compared to existing design. DA based adaptive filter for 10 and 20 coefficient has consumed only 222 and 341 no. of slices, 155 and 212 no. of flip flops, 414 and 642 no. of LUTs respectively. The proposed filter of 10 and 20 coefficients can be operated at an estimated frequency 107.616 MHz and 97.207 MHz, with minimum period of 9.292 and 10.287 respectively by using SPARTAN 3.

# References

- [1] RuiGuoAnd Linda S. De Brunner "Two High-Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic" IEEE Transactions On Circuits And Systems—II: Express Briefs, Vol. 58, No. 9, Pp. 600-604, September 2011.
- [2] S. Haykin, "Adaptive Filter Theory", Pearson Education Asia, 3rd Edition, pp. 324-414.
- [3] Srishtee Chaudhary and Rajesh Mehra, "FPGA Based Adaptive Filter Design Using Least PTH-Norm Technique", International Journal of Soft Computing and Engineering (IJSCE), ISSN: 2231-2307, Vol. 3, No. 2, pp. 208-211, May 2013.

- [4] Rajesh Mehra and Rashmi Arora, "FPGA-Based Design of High-Speed CIC Decimator for Wireless Applications", International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 2, No. 5, pp. 59-62, 2011.
- [5] M. Surya Prakash and Rafi AhamedShaik, "Low-Area and High-Throughput Architecture for an Adaptive Filter Using Distributed Arithmetic" IEEE Transactions on Circuits and Systems, Vol. 60, No. 11, pp. 781-785, November 2013.
- [6] D. J. Allred, H. Yoo, V. Krishnan, W. Huang and D. V. Anderson, "LMS adaptive filters using distributed arithmetic for high throughput", IEEE Transaction on Signal Process, Vol. 52, No. 7, pp. 1327 – 1337, July 2005.
- [7] Rajesh Mehra and Swapna Devi, "Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication Systems", International Journal of VLSI Design & Communication Systems (VLSICS), Vol. 1, No. 2, pp. 13-21, June 2010.
   [8] A. Peled and B. Liu, "A new hardware realization of digital filters," IEEE Transaction on Acoust, Speech and Signal Process, Vol.
- [8] A. Peled and B. Liu, "A new hardware realization of digital filters," IEEE Transaction on Acoust, Speech and Signal Process, Vol. 22, No. 6, pp. 456-462, Dec. 1974.
- [9] Rajesh Mehra, Garima Saini and Sukhbir Singh, "FPGA Based High Speed BCH Encoder for Wireless Communication Applications", IEEE Conference on Communication Systems and Network Technologies, pp. 576-579, 2011.
- [10] Raj Kumari, Dr. Rajesh Mehra and LalitaShrama, "Effective Adaptive Noise Canceller Design Using Normalized LMS", IEEE Conference on Next Generation Computing Technologies, Dehradun, India, pp. 571-575, September 2015.
- [11] SudhanshuBhagel and RafiahamedShaik, "FPGA Implementation of Fast Block LMS Adaptive Filter Using Distributed Arithmetic for High Throughput", IEEE Conference on Acoustics, Speech and Signal Pressing, pp. 443-447, 2011.
- [12] Sang Yoon Park and Pramod Kumar Meher, "Efficient FPGA and ASIC Realizations of a DA-Based Reconfigurable FIR Digital Filter", IEEE Transactions On Circuits And Systems, Vol. 61, No. 7, pp. 511-515, July 2014.
- [13] Bhawna Tiwari and Rajesh Mehra, "FPGA Implementation of FPGA Codec for Digital Video Broadcasting", International Journal of Electrical, Electronics & Communication Engineering (IJEECE), Vol. 2, No. 2, pp. 68-77, 2012.
- [14] S. K. Mendhe, Dr. S. D. Chede and Prof. S. M. Sakhare, "Design and Implementation of Adaptive Echo Canceller Based LMS & NLMS Algorithm", International Journal of Application or Innovation in Engineering & Management, Vol. 3, No. 6, pp. 348-355, 2014.
- [15] Sang Yoon Park and Pramod Kumar Meher, "Low-Power, High-Throughput, and Low- Area Adaptive FIR Filter Based on Distributed Arithmetic", IEEE Transactions On Circuits And Systems, Vol. 60, No. 6, pp. 346-350, June 2013.



**Dr. Rajesh Mehra:** Dr. Mehra is currently associated with Electronics and Communication Engineering Department of National Institute of Technical Teachers' Training & Research, Chandigarh, India since 1996. He has received his doctor of Philosophy in Engineering and Technology from Punjab University, Chandigarh, India in 2015. Dr. Mehra received his Master of Engineering from Punjab University, Chandigarh, India in 2008 and Bachelor of Technology from NIT, Jalandhar, India in 1994. Dr. Mehra has 20 years of academic and industry

experience. He has more than 300 papers in his credit which are published in refereed International Journals and Conferences. Dr. Mehra has 75 M.E. thesis in his credit. He has also authored one book on PLC & SCADA. His research areas are Advanced Digital Signal Processing, VLSI Design, FPGA SystemDesign, Embedded System Design, and Wireless & Mobile Communication. Dr. Mehra is member of IEEE and ISTE.



**Er. Lalita Sharma**: Er. Lalita Sharma is currently associated with School of Engineering & Technology of Shoolini University, Solan, Himachal Pradesh, India since 2011. She is currently pursuing M.E. from National Institute of Technical Teachers Training and Research, Chandigarh India. She has completed her B. Tech from H.P. University, Shimla, India. She is having six years of teaching and industry experience. She has seven papers in her credit which

are published in refereed International Journals and Conferences. Her areas of interest include Advanced Digital Signal Processing, VLSI Design and Artificial Neural Networks.