Handbook Of Elliptic And Hyperelliptic Curve Cryptography Discrete Mathematics And Its Applications can be very useful guide, and handbook of elliptic and hyperelliptic curve cryptography discrete mathematics and its applications play an important role in your products. The problem is that once you have gotten your nifty new product, the handbook of elliptic and hyperelliptic curve cryptography discrete mathematics and its applications gets a brief glance, maybe a once over, but it often tends to get discarded or lost with the original packaging. Typically, commercial sensor nodes are equipped with MCUsclocked at a low-frequency ( i.e., within the 4–12 MHz range). Consequently, executing cryptographic algorithms in those MCUs generally requires a huge amount of time. In this respect, the required energy consumption can be higher than using a separate accelerator based on a Field-programmable Gate Array (FPGA) that is switched on when needed. In this manuscript, we present the design of a cryptographic accelerator suitable for an FPGA-based sensor node and compliant with the IEEE802.15.4 standard. All the embedded resources of the target platform (Xilinx Artix-7) have been maximized in order to provide a cost-effective solution. Moreover, we have added key negotiation capabilities to the IEEE 802.15.4 security suite based on Elliptic Curve Cryptography (ECC. Our results suggest that tailored accelerators based on FPGA can behave better in terms of energy than contemporary software solutions for motes, such as the TinyECC and NanoECC libraries. In this regard, a point multiplication (PM) can be performed between 8.58- and 15.4-times faster, 3.40- to 23.59-times faster (Elliptic Curve Diffie-Hellman, ECDH) and between 5.45- and 34.26-times faster (Elliptic Curve Integrated Encryption Scheme, ECIES). Moreover, the energy consumption was also improved with a factor of 8.96 (PM). 1. Introduction Wireless Medical Sensor Networks (WMSNs) have several benefits. New medical infrastructure can replace wired telemetry applications. This is important in fields related to ambulatory monitoring or rehabilitation, where WMSNs can provide additional flexibility []. Moreover, the same technology can be used in several situations. That means that once an in-home network has been deployed, the same connectivity can be used for emergency situations and be adapted to monitor the patient's evolution. Consequently, the deployment of WMSNs alters the space-temporal dimensions of the traditional medical infrastructure. In this respect, the patients do not have to go regularly to the hospital, since the doctors can receive information about the patient without his/her physical presence. Moreover, homes are reshaped into monitoring centers. Further, WMSNs can be used for faster detection of diseases, as well as for detecting minimal changes in the parameters being monitored []. Furthermore, vulnerable patients, such as infants and senior citizens, can be monitored in order to detect falls via physical activity monitoring systems []. Generally, medical applications utilize commercial sensor nodes based on low-power MCUs. Further, these nodes generally utilize a 2.4 GHz transmitter based on the IEEE 802.15.4 communication protocol []. However, due to the low frequency of the MCUs utilized therein, several practitioners have proposed the utilization of Field-programmable Gate Arrays (FPGAs) in node construction for accelerating a myriad of algorithms, ranging from image processing techniques to cryptographic primitives []. These nodes can be either based on the combination of a low-power MCU and FPGAs, e.g., [,], or purely based upon FPGAs []. However, the former have several advantages over the latter, since the MCU can set the FPGA in suspend or sleep mode, while the accelerating operation is not required, thus saving power. In this manuscript, we proposed investigating the role of FPGAs in the development of infrastructure for sensor networks. In this respect, we explore a variety of topics: •. How cryptographic accelerators can be implemented in FPGA-based nodes or nodes based on the combination of MCU and FPGA for extending the IEEE 802.15.4 security suite with key establishment schemes (Section 4.4). Finally, we present the design of a cryptographic core, implemented in VHDLand utilizing the described components. All the resources of the FPGA are optimally used for the implementation of the different cryptographic algorithms, based on known designs, with a good trade-off between speed and area. The proposed design can be used to accelerate and perform massive encryption and authentication primitives in applications with a large number of nodes, such as a patient monitoring application, either based on a Wireless Sensor Network (WSN) or Wireless Body Area Network (WBAN). This manuscript is structured as follows. First, in Section 2, we describe other implementations of the IEEE 802.15.4 security suite that have been proposed in the literature and summarize our contributions. Then, in Section 3, we outline our implementation. In Section 4, we detail the proposed implementation of the NIST P-192 and B-163 curves. Finally, in Section 5, we arrange the designs sketched out in Sections 3 and 4 together. This results in a cryptographic accelerator compliant with the IEEE 802.15.4 standard and extended with Elliptic Curve Cryptography (ECC) capabilities that can be compared with other implementations in the literature. Finally, we describe our future work in Section 6 and end in Section 7 with some conclusions. 3. The IEEE 802.15.4 Security Suite The IEEE 802.15.4 standard utilizes cryptographic techniques based on symmetric-key cryptography for ensuring data confidentiality, authenticity, integrity and replay protection []. All the security suites utilize a symmetric block cipher mode based on the AES using 128-bit keys []. The AES is utilized for performing both encryption and authentication through the CCMmode []. This mode relies on the Counter (CTR) mode for ensuring confidentiality, whereas the Cipher Block Chaining (CBC) mode is utilized for generating an authentication tag. AES The AES-128 requires 10 rounds for each encryption process. In each round, four different operations manipulate an internal state of 16 bytes. These operations are based on the GF(2 8) extension field. The elements of this field are expressed as polynomials according to the form A( x) = a 7 x 7 + + a 1 x + a 0. The set of coefficients of each polynomial forms an eight-bit vector, represented in GF(2). Consequently, all the AES arithmetic is performed on both the GF(2 8) and GF(2) fields. The internal state of the AES is represented by a 4 × 4 matrix, where each element forms an eight-bit vector. Only the encryption part of the AES is reviewed here, since its decryption part is not utilized in the CCM mode. Henri Cohen, Gerhard Frey, et al., Handbook of Elliptic and Hyperelliptic Curve Cryptography. Colbourn and Jeffrey H. Dinitz, The CRC Handbook of Combinatorial Designs. Steven Furino, Ying Miao, and Jianxing Yin, Frames and Resolvable Designs: Uses. Constructions, and Existence. Randy Goldberg and. Handbook of Elliptic and Hyperelliptic Curve Cryptography Scientific Editors: Henri Cohen, Gerhard Frey Executive Editor: Christophe Doche Authors: Roberto M. This book contains a good introduction to all sorts of public key cryptography, including elliptic curves, at an advanced undergraduate level. It covers most of the main topics in. The Handbook of elliptic and hyperelliptic curve cryptography edited by H. This is an excellent reference for. Ebook Pdf handbook of elliptic and hyperelliptic curve cryptography second edition discrete mathematics and its applications Verified Book Library. The inner four operations of each round in the AES encryption are the following. The AddRoundKey operation mixes the plain-text with the subkey, derived from the key schedule. Then, the SubBytes operation adds non-linearity to the block cipher by replacing each byte of the state with a unique element. This substitution is generally implemented using 256 × eight-bit substitution boxes. However, this substitution is based on two arithmetic operations. These operations encompass a GF(2 8) inversion in tandem with an affine mapping. This affine mapping requires a GF(2 8) multiplication and the addition of an eight-bit constant ( cf., []). Finally, the ShiftRows operation together with the MixColumns operation add diffusion to the AES internal state. The ShiftRows operation is based on a circular shift of the state, whereas the MixColumns operations modifies each four-byte column of the state via GF(2 8) multiplications of a 4 × 4 matrix made of constants. The KeySchedule operation generates 11 subkeys that are used in the ten rounds of AES-128. The generation is recursive, and each subkey is generated in four words of 32 bits. A function (namely g) adds non-linearity to the process using four substitution boxes from the SubBytes operation together with the addition of a variable coefficient (RCON). Finally, the generated subkeys are XORedwith the internal state in each round. By using the AES folded architecture, it is possible to reduce the implementation area by four. Generally, 16 S-BOXes are required to implement the SubBytes operation in one cycle. However, it is possible to implement only four substitution boxes and generate 32 bits of the state per cycle. Likewise, it is possible to reduce the number of MixColumns operations to only one. Moreover, the AddRoundKey operation is reduced from an XOR operation of 128 bits to a 32-bit XOR gate. Finally, the ShiftRows operation is performed by a special arrangement of the AES internal state at the beginning of each round. Hence, the encryption operation of a single block of 128 bits requires 60 cycles, i.e., 10 × 4 = 40, together with two extra cycles per round, due to the latencies of both substitution boxes and input/output memories of the folded register. Besides, we have optimized the AES data-path via DSP blocks in two ways. First, we have replaced the AddRoundKey operation by one DSP block in XOR mode. Second, we have extended the utilization of the DSP blocks to the computations of the MixColumns operation (). Organization of the proposed AES-CCM architecture. The architecture of the KeySchedule operation can also be implemented following an iterative approach by computing a quarter of the subkey in each clock cycle. This implementation, based on [], computes 32 bits of key material per cycle, thus requiring 55 clock cycles to derive the complete set of subkeys ((4 + 1) × 11 = 55). This architecture requires a shift register that processes each 32-bit word before an XOR operation is performed. In order to reduce the area, we have implemented a shift-register totally based on BRAM (). As in the folded register, we have replaced the 32-bit XOR operation of the key schedule with a DSP block. 4. Implementation of Finite Field Arithmetic for ECC In this section, we describe how the finite field arithmetic of two standardized curves (particularly the B-163 and the P-192 curves []) can be implemented mainly based on DSP blocks. ECC was independently proposed by Victor Miller in 1985 and by Neal Koblitz in 1987 [,]. It provides the same level of security of RSAvia smaller key lengths and a reduced set of operations. Hence, the utilization of ECC in area- and power-constrained systems, such as RFIDand sensor nodes, is commonplace. Elliptic Curves (ECs) are generally represented over prime fields ( i.e., GF( p) or (image) p, where p is prime) and binary extension fields ( GF( 2 m) or (image) 2 m). The latter is generally preferred for hardware implementations, since the main operations are based on logic functions and shifts. Prime fields in the form of GF( p) consist of a set of integers, 0,, p – 1, where p is prime. Both the addition and multiplication operations are performed modulo p. For instance, all the operations in the in the P-192 curve are performed modulo p 192 = 2 192 − 2 64 − 1 []. On the other hand, in binary extension fields in the form of GF(2 m), the elements of the field are represented as polynomials, where modular reductions are replaced by a reduction through an irreducible polynomial. In the case of the B-163, with m = 163, the irreducible polynomial is represented as x 163 + x 7 + x 6 + x 3 + 1 []. However, in order to optimize the implementation of ECC arithmetics and avoid implementing the division operation, a number of inverse-free coordinate systems have been proposed in the literature. The importance of selecting a coordinate system stems from the fact that a reduced number of either additions or multiplications is preferred in an energy-constrained design. Therefore, in order to reduce the number of cycles required for performing a point operation in a cryptographic implementation, it is important to carefully choose the coordinate system. In the next section, we describe a number of coordinate systems generally utilized in the literature. We utilize [] as a reference. (1) where p is prime, p > 3 is satisfied and a 4, a 6 ∈ (image) p. Standard projective coordinates utilize triples represented by ( x 1, y 1, z 1). They are derived from an affine point given by ( x 1 z 1, y 1 z 1 )for z 1 ≠ 0. In this system of coordinates, the number of operations for a point addition (PA) consists of 12 multiplications (M) and two squarings (S), whereas it requires seven multiplications (7M) and five squarings (5S) for performing a point doubling (PD). Besides, Jacobian coordinates utilizes triples, ( x 1, y 1, z 1), derived from the ( x 1 z 1 2, y 1 z 1 3 ) affine point, where z 1 ≠ 0. The PA and PD require 12M + 4S and 8M + 3S operations, respectively. Finally, Chudnovsky-Jacobian coordinates utilize points represented with five coordinates i.e., ( x 1, y 1, z 1, z 1 2, z 1 3). The PA operation is performed via 11M + 3S operations, whereas a PD is performed through 5M + 6S operations. Summarizes the number of operations of the coordinate systems described in this section. Performance of coordinate systems in binary extension fields. According to and, we have selected a pair of systems of coordinates suitable for the implementation of the P-192 and the B-163 curves. In the case of the P-192 curve, we have chosen projective coordinates. The Jacobian system of coordinates requires a large number of operations, whereas the Chudnovsky-Jacobian, despite the reduction in the number of multiplications, requires five points per coordinate, which greatly increases the area of the implementation for storing them. In the case of the B-163 curve, we have selected the LD coordinates, since it requires a reduced number of multiplications in comparison with the standard projective and Jacobian coordinates (). Modular Addition and Subtraction Integer modular addition and subtraction are performed mod p 192 = 2 192 −2 64 −1 in the P-192 curve. Algorithms 1 and 2 represents both modular addition and subtraction mod p 192. Algorithm 1 Integer modular addition. Input: Integers (a, b), represented as binary vectors in the form a = ( a 191,, a 0) and b = ( b 191,, b 0), modulus p 192 = 2 192 − 2 64 − 1. Output: c = a + b mod p 192. 1: c 1 = a+ b 2: c 2 = c 1 − p 192 3: if c 2 ≥ 0 then 4: return c 2 5: else 6: return c 1 7: end if Algorithm 2 Integer modular subtraction. Input: Integers ( a, b), represented as binary vectors in the form a = ( a 191,, a 0) and b = ( b 191,, b 0), modulus p 192 = 2 192 − 2 64 − 1. Output: c = a − b mod p 192. 1: c 1 = a − b 2: c 2 = c 1 + p 192 3: if c 1. Modular Reduction The NIST curves utilize pseudo-Mersenne primes for performing fast reductions using only additions and subtractions []. The NIST algorithm for performing reductions in the P-192 curve is depicted in Algorithm 3. The reduction consists of four additions that can be executed in the adder/subtractor. Consequently, a modular reduction can be achieved in 16 cycles. Algorithm 3 Modular reduction p 192. Input: An integer represented as a = ( a 0,, a 6), where a i has a length of 64-bit. Output: a mod p 192 1: c 0 = ( c 2, c 1, c 0) 2: c 1 = (0, c 3, c 3) 3: c 2 = ( c 4, c 4, 0) 4: c 3 = ( c 5, c 5, c 5) 5: return c 0 + c 1 + c 2 + c 3 mod p 192. Modular Multiplication Operation The DSP48E1 block supports 25 × 18-bit multiplications, which can optionally be coupled with a 48-bit accumulator. Generally, the multiplication operation is based on two main operations. First, a group of partial products are computed. Then, they are shifted and accumulated for generating the final result. In the literature, multiplication techniques are generally categorized among parallel and sequential multipliers []. Sequential multipliers process one bit at a time of one of the operands in each cycle, i.e., this bit is multiplied by the second operand, shifted and accumulated. Other algorithms, such as the Booth's multiplier, process two bits per cycle by applying a transformation to certain bit patterns in the operands []. Moreover, other variants, such as the radix-4 and radix-8 Booth's multipliers, extend the number of bits being processed at a time []. However, since we can compute the complete multiplication of two operands of 18-bit in one cycle, implementing any sequential multiplication algorithm would not take advantage of the full features of the DSP block. On the other hand, parallel multipliers generate all the partial products in parallel and accumulate them. Given that we can process a 25 × 18-bit product at a time, we can use several DSPs for generating and accumulating the partial products in parallel. In this case, since we work with 192-bit operands, they can be decomposed in 16 segments of 16-bit and be processed using 16 × 16-bit multiplications. This decomposition is based on the addition of 12 segments shifted k bits, according to their position in the operand. (8) Finally, 23 accumulated partial products can be added together for obtaining the final result. This is done using one DSP block in addition mode. This operation is based on shifting each partial product 2 ik bits for k = 16, e.g., A × B = ( MACC 23 k ≪ 23 k) + + ( MACC 1 ≪ k) + MACC 0. Each MACC operation requires an initial delay (one cycle) to fill the pipeline of the DSP block and an extra cycle for each subsequent multiplication and addition. At the same time, the results of each MACC are accumulated in another DSP block, selected by a multiplexer coupled to a counter. However, given that the first half of partial accumulations ( MACC 0–11) and the second one ( MACC 12–22) are being generated at the same time, the second part is stored, while the first one is processed in a BRAM. Then, this BRAM is read through a counter and added (). Finally, two shift registers are utilized to route the 16-bit segments of each operand ( A, B) to the MACCs. Organization of the proposed B-163 adder. Algorithm 4 GF(2 m) addition. Input: X, Y, Z ∈ GF(2 163). Output: Z = X + Y. 1: Z ← X ⊕ Y 2: return Z Algorithm 5 Bit-serial GF(2 163) multiplication ∀ A, B, C, M ∈ GF( 2 163), M = x 163 + x 7 + x 6 + x 3 + 1. Input: Two 163-bit vectors A = ( a 0,, a 162), B = ( b 0,, b 162) ∈ GF(2 128). Output: One 163-bit vector C = ( c 0,, c 162) ∈ GF(2 163). 1: C ← 0 2: for i = 0 → 162 do 3: if B i = 1 then 4: C = C ⊕ A 5: end if 6: if A 162 = 1 then 7: ( A ≪ 1) ⊕ M 8: else 9: A ≪ 1 10: end if 11: end for 12: return C. Proposed EC Schemes The IEEE 802.15.4 standard does not describe how keys are generated. Those operations are supposed to be provided by the protocol upper layers. Since shared keys need to be renegotiated by the intended parties before the message counter overflows ( i.e., for ensuring key freshness), an efficient key agreement protocol must be implemented. In the proposed design, the ECDH, ECIES and Elliptic Curve Menezes-Qu-Vanstone (ECMQV) schemes can be implemented. We describe ECDH and ECIES, since they have already been implemented in commercial sensor nodes, and their capabilities are compared in Section 5. ECDH ECDH is a key agreement protocol that establishes a shared secret between two non-authenticated parties. It follows a similar approach as the Diffie-Hellman (DH) key exchange []. In ECDH, each party randomly selects a secret value ( x and y, respectively). Then, they compute xG and yG given G as the primitive element of the curve (This element is the generator of the multiplicative group of the finite field.). Both values, xG and yG, are exchanged, and a shared secret, k, is computed as x( yG) and y( xG) due to the associative property of the point multiplication. Both x and y values are considered private keys. The strength of ECDH resides in the Elliptic Curve Discrete Logarithm Problem (ECDLP), i.e., finding an integer, z, where zG = C and C is another element of the field by computing the discrete logarithm, z = log G( zG) (A summary of several methods for solving discrete logarithms can be found in [].). Message Digest Generation As noted before, SHA-256 has been implemented in the proposed accelerator to perform the KDF during the key establishment process. The secure hash algorithm, SHA-256, is part of the SHA-2 family, standardized by NIST []. A hash algorithm provides a fixed-length and unique representation of a message. This is also called a digest. SHA-256 processes blocks of 512 bits and generates a unique digest of 256 bits. The hash function consists of padding of the message in blocks of 512 bits and generating the message digest during 64 iterations. A predefined 32-bit constant (K i) is applied in each iteration in the main pipeline. Moreover, a message scheduler generates a 32-bit word, W j, in each iteration, which is then applied in order to generate the hash (). 5. Results We have constructed two accelerators based on the NIST curves, B-163 and P-192 (). They are compliant with the IEEE 802.15.4 security suite. Consequently, the AES-CCM mode has been implemented according to the design presented in Section 3.1. Moreover, the designs of the arithmetics described in Sections 4.2 and 4.3 for GF(192) and GF(2 163) have also been utilized. Finally, a Finite State Machine (FSM) orchestrates the execution of PA, PD and PM primitives between the different components of the core (). Software Power Analysis in Xilinx Platforms We have performed software power analysis in the designs described in this manuscript through the Xilinx Power Analyzer (XPA) []. Given that dynamic power is not a stable value, the user must provide a simulation file (VCD) containing the value for the signals over an interval of time. We have obtained our VCD files through the Mentor ModelSim simulator. However, a VCD file derived from a standard simulation does not contain all the internal connections and elements that are mapped during the Place and Route (PAR) phase. Hence, it is mandatory to generate a post-PAR simulation model for each operation performed in the core for increasing the accuracy of the power figures. PAR Results of the P-192 and B-163 Operations We have depicted the PAR results of each implemented arithmetic circuit for performing operations on the P-192 and B-163 curves in. In this respect, depicts the area figures for the circuits implemented only using LUTs. We have also depicted the number of BRAMs that we have utilized. In this respect, we have stored both the p 192 modulus in three blocks of BRAM in the P-192 adder/subtractor. Moreover, the P-192 multiplier utilizes one block of BRAM for storing the second half of the partial of products, while the first part is being accumulated. Finally, the B-163 multiplier stores the GF(2 163) irreducible polynomial in two blocks of BRAMs. Place and Route (PAR) results of the cryptographic algorithms implemented only using LUTs (XC7A100TL). According to, we obtained different reductions in area, ranging from 56.08% in the P-192 multiplier to 13.14% in the B-163 multiplier. The reduction achieved in the P-192 multiplier is based on the amount of FPGA resources that the MACCs based on LUT require. Moreover, this suggests that larger reductions in area can be achieved, implementing larger multipliers together with B-163 adders. However, we must take into account that the P-192 multiplier relies on two shift registers for the input operands, which can affect the area requirements. Besides, by using larger operands, a larger register file in the bus slave interface is required. However, if there are available BRAMs, they can be used for both implementing the shift registers (as we do in the AES key schedule, Section 3.1) and the register file. Power and Performance We have generated a post-PAR simulation model of the P-192 and B-163 accelerators. First, we have simulated the execution of several operations for generating the corresponding signal activity file at 10 MHz. The selection of this frequency stems from the fact that this accelerator will run at the typical frequency that motes do []. Second, the VCD file has been fed into XPA for extracting the required power during the execution of each operation. The execution time for each operation includes the writing of the operands (coordinates) into the register file. And depict the power consumption and energy per operation in both accelerators. The performance of the PM operation was measured using the double-and-add algorithm (Algorithm 8). We have depicted an average number of PD and PA operations, i.e., t PDs and 0.5 t PAs. Performance summary of the B-163 accelerator at 10 MHz. Given the area utilization of the SHA-256 implementation, this is the component of the accelerator that requires more power (53 mW in the P-192 accelerator and 49 mW in the B-163). The rest of the operations are executed in the B-163 accelerator with a reduction of 2–8 mW in comparison with the P-192 implementation, according to the achieved reduction in area (Section 5.3). Moreover, despite that the B-163 operations are performed through smaller operands, the fact that the GF(2 163) multiplication requires 19.25 μs per operation undermines an improvement in the energy consumption in the case of the PM, ECDH and ECIES operations (which require three-times more energy in the B-163 accelerator). Nonetheless, the utilization of a parallel or hybrid multiplier for performing the GF(2 163) multiplications can improve both the time and energy consumption. Algorithm 8 Double-and-add algorithm for point multiplication. Input: An integer, k, of length n and point P ∈ GF( p) or GF(2 m). Output: A point, Z = kP ∈ ( GF( p) or GF(2 m). 1: Z ← P 2: for i = 0 → n − 1 do 3: Z = Z + Z 4: if k i = 1 then 5: Z = Z + P 6: end if 7: end for 8: return Z Finally, and depict a comparison of the main operations (PM, ECDH, ECIES and AES-128 encryption) between the proposed design and software implementations tested by [–]. Comparison on energy consumption (mJ) with other ECC and AES-128 implementations in commercial sensor nodes (B-163). As depicted in, the operations executed in our implementation are between 8.58- and 15.4-times faster (PM), 3.40- to 23.59-times faster (ECDH), 5.45- and 34.26-times faster (ECIES) and between 64.60- and 404-times faster in the case of AES. Furthermore, a considerable reduction in energy consumption () is also shown. Finally, it is worth noting that we are using the XC7A100TL FPGA, which is one of the largest platforms of the Artix-7 series. Rather, using the XC7A20S (2,500 slices, 60 DSP48E1) renders the selected platform ill-suited, since a better power consumption and price are expected. Nevertheless, this platform was not available at the time of writing. 6. Future Work The utilization of FPGAs for sensor node construction adopts the typical threat model of FPGA-based systems. That means that an attacker generally can have two main interests in the platform: recovering the secret keys and disrupting the system. Consequently, the unused I/O pins of the FPGA must be protected against leakage, and they must reject any request. Moreover, the programming interface of the FPGA must be locked for non-authorized readings and updates. In this respect, since we are using an SRAMFPGA, an external non-volatile memory is required to store the FPGA configuration, and bitstream encryption must be activated to avoid tampering. Finally, anti-fuse and FLASH-based FPGAs can be used to avoid this problem, as well as to mitigate the impact of side-channel attacks. Moreover, a number of authors have proposed different techniques to avoid these attacks on FPGAs based on masking, hiding and utilizing random-based arithmetics [–]. Another issue not discussed here has to do with the generation of keys through random data. In this respect, a number of authors have proposed several designs. First, Pseudo-Random Number Generators (PRNGs), based on Linear Feedback Shift Registers (LFSRs), can be used if the seed's entropy is large enough. For instance, seed extraction from different natural phenomena has been proposed, such as nuclear decay or thermal noise []. FPGA-based designs of LFSRs are numerous in the literature; see, for instance, [–]. Second, True Random Number Generators (TRNGs) utilize a physical process for generating random data. Particularly, those based on FPGA focus on exploiting the imperfections of components and logic implementations, such as the jitter of PLLsand ring oscillators [–]. Finally, TRNG designs based on Physical Unclonable Functions (PUFs) have been also proposed, as well as those based on writing collisions in BRAMs [–]. 7. Conclusions In this manuscript, we have presented the design of two cryptographic accelerators suitable for FPGA-based nodes, extended with key negotiation capabilities. The proposed platform is based on the low-power Xilinx Artix-7 FPGA. Moreover, we have taken advantage of the DSP48E1 slice for reducing the area figures of our design. In this respect, we have replaced the logic functions in the AES folded architecture described by Chodowiec et al. [], compacting even more the implementation of the encryption operation. Besides, a similar approach was followed for implementing the arithmetic of the NIST P-192 and B-163 curves. Finally, by clocking the FPGA at 10 MHz, the required energy for performing a number of cryptographic operations was smaller in comparison to several software alternatives for motes, such as the NanoECC and TinyECC libraries.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
March 2018
Categories |