rfc9696.original | rfc9696.txt | |||
---|---|---|---|---|
RIFT WG Y. Wei, Ed. | Internet Engineering Task Force (IETF) Y. Wei, Ed. | |||
Internet-Draft Z. Zhang | Request for Comments: 9696 Z. Zhang | |||
Intended status: Informational ZTE Corporation | Category: Informational ZTE Corporation | |||
Expires: 19 December 2024 D. Afanasiev | ISSN: 2070-1721 D. Afanasiev | |||
Yandex | Yandex | |||
P. Thubert | P. Thubert | |||
Cisco Systems | Individual | |||
T. Przygienda | T. Przygienda | |||
Juniper Networks | Juniper Networks | |||
17 June 2024 | December 2024 | |||
RIFT Applicability and Operational Considerations | Routing in Fat Trees (RIFT) Applicability and Operational Considerations | |||
draft-ietf-rift-applicability-17 | ||||
Abstract | Abstract | |||
This document discusses the properties, applicability and operational | This document discusses the properties, applicability, and | |||
considerations of RIFT in different network scenarios. It intends to | operational considerations of Routing in Fat Trees (RIFT) in | |||
provide a rough guide how RIFT can be deployed to simplify routing | different network scenarios with the intention of providing a rough | |||
operations in Clos topologies and their variations. | guide on how RIFT can be deployed to simplify routing operations in | |||
Clos topologies and their variations. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
approved by the IESG are candidates for any level of Internet | ||||
Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 19 December 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9696. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology | |||
3. Problem Statement of Routing in Modern IP Fabric Fat Tree | 3. Problem Statement of Routing in Modern IP Fabric Fat Tree | |||
Networks . . . . . . . . . . . . . . . . . . . . . . . . 4 | Networks | |||
4. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 5 | 4. Applicability of RIFT to Clos IP Fabrics | |||
4.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 5 | 4.1. Overview of RIFT | |||
4.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 8 | 4.2. Applicable Topologies | |||
4.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 8 | 4.2.1. Horizontal Links | |||
4.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 8 | 4.2.2. Vertical Shortcuts | |||
4.2.3. Generalizing to any Directed Acyclic Graph . . . . . 9 | 4.2.3. Generalizing to Any Directed Acyclic Graph | |||
4.2.4. Reachability of Internal Nodes in the Fabric . . . . 10 | 4.2.4. Reachability of Internal Nodes in the Fabric | |||
4.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 10 | 4.3. Use Cases | |||
4.3.1. Data Center Topologies . . . . . . . . . . . . . . . 10 | 4.3.1. Data Center Topologies | |||
4.3.2. Metro Networks . . . . . . . . . . . . . . . . . . . 11 | 4.3.2. Metro Networks | |||
4.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 12 | 4.3.3. Building Cabling | |||
4.3.4. Internal Router Switching Fabrics . . . . . . . . . . 12 | 4.3.4. Internal Router Switching Fabrics | |||
4.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 12 | 4.3.5. CloudCO | |||
5. Operational Considerations . . . . . . . . . . . . . . . . . 14 | 5. Operational Considerations | |||
5.1. South Reflection . . . . . . . . . . . . . . . . . . . . 15 | 5.1. South Reflection | |||
5.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 15 | 5.2. Suboptimal Routing on Link Failures | |||
5.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 17 | 5.3. Black-Holing on Link Failures | |||
5.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 18 | 5.4. Zero Touch Provisioning (ZTP) | |||
5.5. Miscabling . . . . . . . . . . . . . . . . . . . . . . . 19 | 5.5. Miscabling | |||
5.5.1. Miscabling Examples . . . . . . . . . . . . . . . . . 19 | 5.5.1. Miscabling Examples | |||
5.5.2. Miscabling considerations . . . . . . . . . . . . . . 21 | 5.5.2. Miscabling Considerations | |||
5.6. Multicast and Broadcast Implementations . . . . . . . . . 22 | 5.6. Multicast and Broadcast Implementations | |||
5.7. Positive vs. Negative Disaggregation . . . . . . . . . . 23 | 5.7. Positive vs. Negative Disaggregation | |||
5.8. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 24 | 5.8. Mobile Edge and Anycast | |||
5.9. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 26 | 5.9. IPv4 over IPv6 | |||
5.10. In-Band Reachability of Nodes . . . . . . . . . . . . . . 27 | 5.10. In-Band Reachability of Nodes | |||
5.11. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 28 | 5.11. Dual-Homing Servers | |||
5.12. Fabric with A Controller . . . . . . . . . . . . . . . . 28 | 5.12. Fabric with a Controller | |||
5.12.1. Controller Attached to ToFs . . . . . . . . . . . . 29 | 5.12.1. Controller Attached to ToFs | |||
5.12.2. Controller Attached to Leaf . . . . . . . . . . . . 29 | 5.12.2. Controller Attached to Leaf | |||
5.13. Internet Connectivity Within Underlay . . . . . . . . . . 29 | 5.13. Internet Connectivity Within Underlay | |||
5.13.1. Internet Default on the Leaf . . . . . . . . . . . . 30 | 5.13.1. Internet Default on the Leaf | |||
5.13.2. Internet Default on the ToFs . . . . . . . . . . . . 30 | 5.13.2. Internet Default on the ToFs | |||
5.14. Subnet Mismatch and Address Families . . . . . . . . . . 30 | 5.14. Subnet Mismatch and Address Families | |||
5.15. Anycast Considerations . . . . . . . . . . . . . . . . . 30 | 5.15. Anycast Considerations | |||
5.16. IoT Applicability . . . . . . . . . . . . . . . . . . . . 31 | 5.16. IoT Applicability | |||
5.17. Key Management . . . . . . . . . . . . . . . . . . . . . 32 | 5.17. Key Management | |||
5.18. TTL/HopLimit of 1 vs. 255 on LIEs/TIEs . . . . . . . . . 33 | 5.18. TTL/Hop Limit of 1 vs. 255 on LIEs/TIEs | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 33 | 6. Security Considerations | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 | 7. IANA Considerations | |||
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 33 | 8. References | |||
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 33 | 8.1. Normative References | |||
10. Normative References . . . . . . . . . . . . . . . . . . . . 34 | 8.2. Informative References | |||
11. Informative References . . . . . . . . . . . . . . . . . . . 35 | Acknowledgments | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 | Contributors | |||
Authors' Addresses | ||||
1. Introduction | 1. Introduction | |||
This document discusses the properties and applicability of "Routing | This document discusses the properties and applicability of "RIFT: | |||
in Fat Trees" [RIFT] in different deployment scenarios and highlights | Routing in Fat Trees" [RFC9692] in different deployment scenarios and | |||
the operational simplicity of the technology compared to traditional | highlights the operational simplicity of the technology compared to | |||
routing solutions. It also documents special considerations when | classical routing solutions. It also documents special | |||
RIFT is used with or without overlays and/or controllers, and how | considerations when RIFT is used with or without overlays and/or | |||
RIFT identifies miscablings and reroutes around node and link | controllers and how RIFT identifies miscablings and reroutes around | |||
failures. | node and link failures. | |||
2. Terminology | 2. Terminology | |||
This document uses the terminology of RIFT [RIFT]. The most | This document uses the terminology defined in [RFC9692]. The most | |||
frequently used terminologies defined in RIFT are listed here. These | frequently used terms and their definitions from that document are | |||
terms are consistent with definition in RIFT [RIFT] | listed here. | |||
Clos/Fat Tree: | Clos / Fat Tree: | |||
This document uses the terms Clos and Fat Tree interchangeably | This document uses the terms "Clos" and "Fat Tree" interchangeably | |||
where it always refers to a folded spine-and-leaf topology with | where it always refers to a folded spine-and-leaf topology with | |||
possibly multiple Points of Delivery (PoDs) and one or multiple | possibly multiple Points of Delivery (PoDs) and one or multiple | |||
Top of Fabric (ToF) planes. Several modifications such as leaf- | Top of Fabric (ToF) planes. Several modifications such as leaf- | |||
2-leaf shortcuts and multiple level shortcuts are possible and | 2-leaf shortcuts and multiple level shortcuts are possible and | |||
described further in the document. | described further in the document. | |||
Crossbar: | Crossbar: | |||
Physical arrangement of ports in a switching matrix without | Physical arrangement of ports in a switching matrix without | |||
implying any further scheduling or buffering disciplines. | implying any further scheduling or buffering disciplines. | |||
Directed Acyclic Graph (DAG): | Directed Acyclic Graph (DAG): | |||
A finite directed graph with no directed cycles (loops). If links | A finite directed graph with no directed cycles (loops). If links | |||
in a Clos are considered as either being all directed towards the | in a Clos are considered as either being all directed towards the | |||
top or vice versa, each of such two graphs is a DAG. | top or bottom, each of such two graphs is a DAG. | |||
Disaggregation: | Disaggregation: | |||
Process in which a node decides to advertise more specific | The process in which a node decides to advertise more specific | |||
prefixes Southwards, either positively to attract the | prefixes southwards, either positively to attract the | |||
corresponding traffic, or negatively to repel it. Disaggregation | corresponding traffic or negatively to repel it. Disaggregation | |||
is performed to prevent traffic loss and suboptimal routing to the | is performed to prevent traffic loss and suboptimal routing to the | |||
more specific prefixes. | more specific prefixes. | |||
Leaf: | Leaf: | |||
A node without southbound adjacencies. Level 0 implies a leaf in | A node without southbound adjacencies. Level 0 implies a leaf in | |||
RIFT but a leaf does not have to be level 0. | RIFT, but a leaf does not have to be level 0. | |||
LIE: | LIE: | |||
This is an acronym for a "Link Information Element" exchanged on | This is an acronym for "Link Information Element" exchanged on all | |||
all the system's links running RIFT to form _ThreeWay_ adjacencies | the system's links running RIFT to form _ThreeWay_ adjacencies and | |||
and carry information used to perform RIFT Zero Touch Provisioning | carry information used to perform RIFT Zero Touch Provisioning | |||
(ZTP) of levels. | (ZTP) of levels. | |||
South Reflection: | South Reflection: | |||
Often abbreviated just as "reflection", it defines a mechanism | Often abbreviated just as "reflection", South Reflection defines a | |||
where South Node TIEs are "reflected" from the level south back up | mechanism where South Node TIEs are "reflected" from the level | |||
north to allow nodes in the same level without E-W links to be | south back up north to allow nodes in the same level without East- | |||
aware of each other's node Topology Information Elements (TIEs). | West links to be aware of each other's node Topology Information | |||
Elements (TIEs). | ||||
Spine: | Spine: | |||
Any nodes north of leaves and south of ToF nodes. Multiple layers | Any nodes north of leaves and south of ToF nodes. Multiple layers | |||
of spines in a PoD are possible. | of spines in a PoD are possible. | |||
TIE: | TIE: | |||
This is an acronym for a "Topology Information Element". TIEs are | This is an acronym for "Topology Information Element". TIEs are | |||
exchanged between RIFT nodes to describe parts of a network such | exchanged between RIFT nodes to describe parts of a network such | |||
as links and address prefixes. A TIE has always a direction and a | as links and address prefixes. A TIE always has a direction and a | |||
type. North TIEs (sometimes abbreviated as N-TIEs) are used when | type. North TIEs (sometimes abbreviated as N-TIEs) are used when | |||
dealing with TIEs in the northbound representation and South-TIEs | dealing with TIEs in the northbound representation, and South-TIEs | |||
(sometimes abbreviated as S-TIEs) for the southbound equivalent. | (sometimes abbreviated as S-TIEs) are used for the southbound | |||
TIEs have different types such as node and prefix TIEs. | equivalent. TIEs have different types, such as node and prefix | |||
TIEs. | ||||
3. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | 3. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks | |||
Clos [CLOS] topologies (called commonly a fat tree/network in modern | Clos [CLOS] topologies (commonly called a Fat Tree/network in modern | |||
IP fabric considerations as homonym to the original definition of the | IP fabric considerations as a similar term for the original | |||
term Fat Tree [FATTREE]) have gained prominence in today's | definition of the term Fat Tree [FATTREE]) have gained prominence in | |||
networking, primarily as a result of the paradigm shift towards a | today's networking, primarily as a result of the paradigm shift | |||
centralized data-center based architecture that deliver a majority of | towards a centralized data-center-based architecture that delivers a | |||
computation and storage services. | majority of computation and storage services. | |||
Current routing protocols were geared towards a network with an | Current routing protocols were geared towards a network with an | |||
irregular topology with isotropic properties, and low degree of | irregular topology with isotropic properties and a low degree of | |||
connectivity. When applied to Fat Tree topologies: | connectivity. When applied to Fat Tree topologies: | |||
* They tend to need extensive configuration or provisioning during | * They tend to need extensive configuration or provisioning during | |||
initialization and adding or removing nodes from the fabric. | initialization and adding or removing nodes from the fabric. | |||
* For link state routing protocols, all nodes including spine and | * For link-state routing protocols, all nodes including spine-and- | |||
leaf nodes learn the entire network topology and routing | leaf nodes learn the entire network topology and routing | |||
information, which is in fact, not needed on the leaf nodes during | information, which is actually not needed on the leaf nodes during | |||
normal operation. They flood significant amounts of duplicate | normal operation. They flood significant amounts of duplicate | |||
link state information between spine and leaf nodes during | link-state information between spine-and-leaf nodes during | |||
topology updates and convergence events, requiring that additional | topology updates and convergence events, requiring that additional | |||
CPU and link bandwidth be consumed. This may impact the stability | CPU and link bandwidth be consumed. This may impact the stability | |||
and scalability of the fabric, make the fabric less reactive to | and scalability of the fabric, make the fabric less reactive to | |||
failures, and prevent the use of cheaper hardware at the lower | failures, and prevent the use of cheaper hardware at the lower | |||
levels (i.e. spine and leaf nodes). | levels (i.e., spine-and-leaf nodes). | |||
4. Applicability of RIFT to Clos IP Fabrics | 4. Applicability of RIFT to Clos IP Fabrics | |||
Further content of this document assumes that the reader is familiar | Further content of this document assumes that the reader is familiar | |||
with the terms and concepts used in OSPF (Open Shortest Path First) | with the terms and concepts used in the Open Shortest Path First | |||
[RFC2328], OSPF for IPv6 [RFC5340] and IS-IS (Intermediate System to | (OSPF) [RFC2328], OSPF for IPv6 [RFC5340], and Intermediate System to | |||
Intermediate System) [ISO10589-Second-Edition] link-state protocols. | Intermediate System (IS-IS) [ISO10589-Second-Edition] link-state | |||
The sections of RIFT [RIFT] outline the requirements of routing in IP | protocols. [RFC9692] outlines the requirements of routing in IP | |||
fabrics and RIFT protocol concepts. | fabrics and RIFT protocol concepts. | |||
4.1. Overview of RIFT | 4.1. Overview of RIFT | |||
RIFT is a dynamic routing protocol that is tailored for use in Clos, | RIFT is a dynamic routing protocol that is tailored for use in Clos, | |||
Fat-Tree, and other anisotropic topologies. A core property | Fat Tree, and other anisotropic topologies. Therefore, a core | |||
therefore of RIFT is that its operation is sensitive to the structure | property of RIFT is that its operation is sensitive to the structure | |||
of the fabric - it is anisotropic. RIFT acts as a link-state | of the fabric -- it is anisotropic. RIFT acts as a link-state | |||
protocol when "pointing north", advertising southwards routes to | protocol when "pointing north", advertising southward routes to | |||
northwards peers (parents) through flooding and database | northward peers (parents) through flooding and database | |||
synchronization. When "pointing south", RIFT operates hop-by-hop | synchronization. When "pointing south", RIFT operates hop-by-hop | |||
like a distance- vector protocol, typically advertising a fabric | like a distance-vector protocol, typically advertising a fabric | |||
default route towards the Top of Fabric (ToF, aka superspine) to | default route towards the ToF, aka superspine, to southward peers | |||
southwards peers (children). | (children). | |||
The fabric default is typically the default route, as described in | The fabric default is typically the default route as described in | |||
Section 6.3.8 "Southbound Default Route Origination" of RIFT [RIFT]. | Section 6.3.8 ("Southbound Default Route Origination") of [RFC9692]. | |||
The ToF nodes may alternatively originate more specific prefixes (P') | The ToF nodes may alternatively originate more specific prefixes (P') | |||
southbound instead of the default route. In such a scenario, all | southbound instead of the default route. In such a scenario, all | |||
addresses carried within the RIFT domain must be contained within P', | addresses carried within the RIFT domain must be contained within P', | |||
and it is possible for a leaf that acts as gateway to the Internet to | and it is possible for a leaf that acts as gateway to the Internet to | |||
advertise the default route instead. | advertise the default route instead. | |||
RIFT floods flat link-state information northbound only so that each | RIFT floods flat link-state information northbound only so that each | |||
level obtains the full topology of levels south of it. That | level obtains the full topology of the levels that are south of it. | |||
information is never flooded east-west or back south again. So a top | That information is never flooded East-West or back south again, so a | |||
tier node has full set of prefixes from the Shortest Path First (SPF) | top tier node has a full set of prefixes from the Shortest Path First | |||
calculation. | (SPF) calculation. | |||
In the southbound direction, the protocol operates like a "fully | In the southbound direction, the protocol operates like a "fully | |||
summarizing, unidirectional" path-vector protocol or rather a | summarizing, unidirectional" path-vector protocol or, rather, a | |||
distance-vector with implicit split horizon. Routing information, | distance-vector with implicit split horizon. Routing information, | |||
normally just the default route, propagates one hop south and is "re- | normally just the default route, propagates one hop south and is "re- | |||
advertised" by nodes at next lower level. | advertised" by nodes at next lower level. | |||
+---------------+ +----------------+ | +---------------+ +----------------+ | |||
| ToF | | ToF | LEVEL 2 | | ToF | | ToF | LEVEL 2 | |||
+ ++------+--+--+-+ ++-+--+----+-----+ | + ++------+--+--+-+ ++-+--+----+-----+ | |||
| | | | | | | | | ^ | | | | | | | | | | ^ | |||
+ | | | +-------------------------+ | | + | | | +-------------------------+ | | |||
Distance | +-------------------+ | | | | | | Distance- | +-------------------+ | | | | | | |||
Vector | | | | | | | | + | Vector | | | | | | | | + | |||
South | | | | +--------+ | | | Link-State | South | | | | +--------+ | | | Link-State | |||
+ | | | | | | | | Flooding | + | | | | | | | | Flooding | |||
| | | +----------------+ | | | North | | | | +----------------+ | | | North | |||
v | | | | | | | | + | v | | | | | | | | + | |||
++---+-+ +------+ +-+----+ ++----++ | | ++---+-+ +------+ +-+----+ ++----++ | | |||
|SPINE | |SPINE | | SPINE| | SPINE| | LEVEL 1 | |SPINE | |SPINE | | SPINE| | SPINE| | LEVEL 1 | |||
+ ++----++ ++---+-+ +-+--+-+ ++----++ | | + ++----++ ++---+-+ +-+--+-+ ++----++ | | |||
+ | | | | | | | | | ^ N | + | | | | | | | | | ^ N | |||
Distance | +-------+ | | +--------+ | | | E | Distance- | +-------+ | | +--------+ | | | E | |||
Vector | | | | | | | | | +------> | Vector | | | | | | | | | +------> | |||
South | +-------+ | | | +------+ | | | | | South | +-------+ | | | +------+ | | | | | |||
+ | | | | | | | | | + | + | | | | | | | | | + | |||
v ++--++ +-+-++ ++--++ ++--++ + | v ++--++ +-+-++ ++--++ ++--++ + | |||
|LEAF| |LEAF| |LEAF| |LEAF| LEVEL 0 | |LEAF| |LEAF| |LEAF| |LEAF| LEVEL 0 | |||
+----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ | |||
Figure 1: RIFT overview | Figure 1: RIFT Overview | |||
A spine node has only information necessary for its level, which is | A spine node only has information necessary for its level, which is | |||
all destinations south of the node based on SPF calculation, default | all destinations south of the node based on SPF calculation, the | |||
route, and potentially disaggregated routes. | default route, and potentially disaggregated routes. | |||
RIFT combines the advantage of both link-state and distance-vector: | RIFT combines the advantages of both link-state and distance-vector | |||
protocols: | ||||
* Fastest possible convergence | * Fastest possible convergence | |||
* Automatic detection of topology | * Automatic detection of topology | |||
* Minimal routes/information on Top-of-Rack (ToR) switches, aka leaf | * Minimal routes/information on Top-of-Rack (ToR) switches, aka leaf | |||
nodes | nodes | |||
* High degree of ECMP | * High degree of ECMP | |||
* Fast de-commissioning of nodes | * Fast decommissioning of nodes | |||
* Maximum propagation speed with flexible prefixes in an update | * Maximum propagation speed with flexible prefixes in an update | |||
So there are two types of link-state database which are "north | There are two types of link-state databases that are "north | |||
representation" North Topology Information Elements (N-TIEs) and | representation" North Topology Information Elements (N-TIEs) and | |||
"south representation" South Topology Information Elements (S-TIEs). | "south representation" South Topology Information Elements (S-TIEs). | |||
The N-TIEs contain a link-state topology description of lower levels | The N-TIEs contain a link-state topology description of lower levels, | |||
and S-TIEs carry simply default and disaggregated routes for the | and the S-TIEs simply carry default and disaggregated routes for the | |||
lower levels. | lower levels. | |||
RIFT also eliminates major disadvantages of link-state and distance- | RIFT also eliminates major disadvantages of link-state and distance- | |||
vector with: | vector protocols with the following: | |||
* Reduced and balanced flooding | * Reduced and balanced flooding | |||
* Level constrained automatic neighbor discovery | * Level-constrained automatic neighbor discovery | |||
To achieve this, RIFT builds on the art of IGPs, not only OSPF and | To achieve this, RIFT builds on the art of IGPs, such as OSPF, IS-IS, | |||
IS-IS but also MANET and IoT (Internet of Things), to provide unique | Mobile Ad Hoc Network (MANET), and Internet of Things (IoT) to | |||
features: | provide unique features: | |||
* Automatic (positive or negative) route disaggregation of | * Automatic (positive or negative) route disaggregation of northward | |||
northwards routes upon fallen leaves | routes upon fallen leaves | |||
* Recursive operation in the case of negative route disaggregation | * Recursive operation in the case of negative route disaggregation | |||
* Anisotropic routing that extends a principle seen in RPL [RFC6550] | * Anisotropic routing that extends a principle seen in the Routing | |||
to wide superspines | Protocol for Low-Power and Lossy Networks (RPL) [RFC6550] to wide | |||
superspines | ||||
* Optimal flooding reduction that derives from the concept of a | * Optimal flooding reduction that derives from the concept of a | |||
"multipoint relay" (MPR) found in OLSR [RFC3626] and balances the | "multipoint relay" (MPR) found in Optimized Link State Routing | |||
flooding load over northbound links and nodes. | (OLSR) [RFC3626] and balances the flooding load over northbound | |||
links and nodes | ||||
Additional advantages that are unique to RIFT are listed below, the | Additional advantages that are unique to RIFT are listed below. The | |||
details of which can be found in RIFT [RIFT]. | details of these advantages can be found in RIFT [RFC9692]. | |||
* True ZTP (Zero Touch Provisioning) | * True ZTP | |||
* Minimal blast radius on failures | * Minimal blast radius on failures | |||
* Can utilize all paths through fabric without looping | * Can utilize all paths through fabric without looping | |||
* Simple leaf implementation that can scale down to servers | * Simple leaf implementation that can scale down to servers | |||
* Key-Value store | * Key-value store | |||
* Horizontal links used for protection only | * Horizontal links used for protection only | |||
4.2. Applicable Topologies | 4.2. Applicable Topologies | |||
Albeit RIFT is specified primarily for "proper" Clos or Fat Tree | Albeit RIFT is specified primarily for "proper" Clos or Fat Tree | |||
topologies, the protocol natively supports Points of Delivery (PoD) | topologies, the protocol natively supports Points of Delivery (PoD) | |||
concepts, which, strictly speaking, are not found in the original | concepts, which, strictly speaking, are not found in the original | |||
Clos concept. | Clos concept. | |||
Further, the specification explains and supports operations of multi- | Further, the specification explains and supports operations of multi- | |||
plane Clos variants where the protocol recommends the use of inter- | plane Clos variants where the protocol recommends the use of inter- | |||
plane rings at the Top-of-Fabric level to allow the reconciliation of | plane rings at the ToF level to allow the reconciliation of topology | |||
topology view of different planes to make the negative disaggregation | view of different planes to make the Negative Disaggregation viable | |||
viable in case of failures within a plane. These observations hold | in case of failures within a plane. These observations hold not only | |||
not only in case of RIFT but also in the generic case of dynamic | in case of RIFT but also in the generic case of dynamic routing on | |||
routing on Clos variants with multiple planes and failures in bi- | Clos variants with multiple planes and failures in bisectional | |||
sectional bandwidth, especially on the leafs. | bandwidth, especially on the leaves. | |||
4.2.1. Horizontal Links | 4.2.1. Horizontal Links | |||
RIFT is not limited to pure Clos divided into PoD and multi-planes | RIFT is not limited to pure Clos divided into PoD and multi-planes | |||
but supports horizontal (East-West) links below the top of fabric | but supports horizontal (East-West) links below the ToF level. Those | |||
level. Those links are used only for last resort northbound | links are used only for last resort northbound forwarding when a | |||
forwarding when a spine loses all its northbound links or cannot | spine loses all its northbound links or cannot compute a default | |||
compute a default route through them. | route through them. | |||
A full-mesh connectivity between nodes on the same level can be | A full-mesh connectivity between nodes on the same level can be | |||
employed and that allows N-SPF to provide for any node losing all its | deployed, which allows North SPF (N-SPF) to provide for any node | |||
northbound adjacencies (as long as any of the other nodes in the | losing all its northbound adjacencies (as long as any of the other | |||
level are northbound connected) to still participate in northbound | nodes in the level are northbound connected) and still participate in | |||
forwarding. | northbound forwarding. | |||
Note that a "ring" of horizontal links at any level below ToF does | Note that a "ring" of horizontal links at any level below ToF does | |||
not provide a "ring-based protection" scheme since the SPF | not provide a "ring-based protection" scheme since the SPF | |||
computation would have to deal necessarily with breaking of "loops", | computation would have to deal with breaking of "loops", an | |||
an application for which RIFT is not intended. | application for which RIFT is not intended. | |||
4.2.2. Vertical Shortcuts | 4.2.2. Vertical Shortcuts | |||
Through relaxations of the specified adjacency forming rules, RIFT | Through relaxations of the specified adjacency forming rules, RIFT | |||
implementations can be extended to support vertical "shortcuts". The | implementations can be extended to support vertical "shortcuts". The | |||
RIFT specification itself does not provide the exact details since | RIFT specification itself does not provide the exact details since | |||
the resulting solution suffers from either much larger blast radius | the resulting solution suffers from either a much larger blast radius | |||
with increased flooding volumes or in case of maximum aggregation | with increased flooding volumes or bow tie problems in the case of | |||
routing, bow-tie problems. | maximum aggregation routing. | |||
4.2.3. Generalizing to any Directed Acyclic Graph | 4.2.3. Generalizing to Any Directed Acyclic Graph | |||
RIFT is an anisotropic routing protocol, meaning that it has a sense | RIFT is an anisotropic routing protocol, meaning that it has a sense | |||
of direction (northbound, southbound, east-west) and that it operates | of direction (northbound, southbound, and East-West) and operates | |||
differently depending on the direction. | differently depending on the direction. | |||
Since a DAG provides a sense of north (the direction of the DAG) and | Since a DAG provides a sense of north (the direction of the DAG) and | |||
of south (the reverse), it can be used to apply RIFT——an edge in the | south (the reverse), it can be used to apply RIFT -- an edge in the | |||
DAG that has only incoming vertices is a ToF node. | DAG that has only incoming vertices is a ToF node. | |||
There are a number of caveats though: | There are a number of caveats though: | |||
* The DAG structure must exist before RIFT starts, so there is a | * The DAG structure must exist before RIFT starts, so there is a | |||
need for a companion protocol to establish the logical DAG | need for a companion protocol to establish the logical DAG | |||
structure. | structure. | |||
* A generic DAG does not have a sense of east and west. The | * A generic DAG does not have a sense of East and West. The | |||
operation specified for east-west links and the southbound | operation specified for East-West links and the southbound | |||
reflection between nodes are not applicable. Also ZTP will derive | reflection between nodes are not applicable. Also, ZTP will | |||
a sense of depth that will eliminate some links. Variations of | derive a sense of depth that will eliminate some links. | |||
ZTP could be derived to meet specific objectives, e.g., make it so | Variations of ZTP could be derived to meet specific objectives, | |||
that most routers have at least 2 parents to reach the ToF. | e.g., make it so that most routers have at least two parents to | |||
reach the ToF. | ||||
* RIFT applies to any Destination-Oriented DAG (DODAG) where there's | * RIFT applies to any Destination-Oriented DAG (DODAG) where there's | |||
only one ToF node and the problem of disaggregation does not | only one ToF node and the problem of disaggregation does not | |||
exist. In that case, RIFT operates very much like RPL [RFC6550], | exist. In that case, RIFT operates very much like RPL [RFC6550], | |||
but using Link State for southbound routes (downwards in RPL's | but uses link-state information for southbound routes (downwards | |||
terms). For an arbitrary DAG with multiple destinations (ToFs) | in RPL's terms). For an arbitrary DAG with multiple destinations | |||
the way disaggregation happens has to be considered. | (ToFs), the way disaggregation happens has to be considered. | |||
* Positive disaggregation expects that most of the ToF nodes reach | * Positive Disaggregation expects that most of the ToF nodes reach | |||
most of the leaves, so disaggregation is the exception as opposed | most of the leaves, so disaggregation is the exception as opposed | |||
to the rule. When this is no longer true, it makes sense to turn | to the rule. When this is no longer true, it makes sense to turn | |||
off disaggregation and route between the ToF nodes over a ring, a | off disaggregation and route between the ToF nodes over a ring, a | |||
full mesh, transit network, or a form of area zero. There again, | full mesh, a transit network, or a form of area zero. Then again, | |||
this operation is similar to RPL operating as a single DODAG with | this operation is similar to RPL operating as a single DODAG with | |||
a virtual root. | a virtual root. | |||
* In order to aggregate and disaggregate routes, RIFT requires that | * In order to aggregate and disaggregate routes, RIFT requires that | |||
all the ToF nodes share the full knowledge of the prefixes in the | all the ToF nodes share the full knowledge of the prefixes in the | |||
fabric. This can be achieved with a ring as suggested by "RIFT" | fabric. This can be achieved with a ring as suggested by RIFT | |||
[RIFT], by some preconfiguration, or using a synchronization with | [RFC9692], by some preconfiguration, or by using a synchronization | |||
a common repository where all the active prefixes are registered. | with a common repository where all the active prefixes are | |||
registered. | ||||
4.2.4. Reachability of Internal Nodes in the Fabric | 4.2.4. Reachability of Internal Nodes in the Fabric | |||
RIFT does not require that nodes have reachable addresses in the | RIFT does not require that nodes have reachable addresses in the | |||
fabric, though it is clearly desirable for operational purposes. | fabric, though it is clearly desirable for operational purposes. | |||
Under normal operating conditions this can be easily achieved by | Under normal operating conditions, this can be easily achieved by | |||
injecting the node's loopback address into North and South Prefix | injecting the node's loopback address into Prefix North TIEs and | |||
TIEs or other implementation specific mechanisms. | Prefix South TIEs or other implementation-specific mechanisms. | |||
Special considerations arise when a node loses all northbound | Special considerations arise when a node loses all northbound | |||
adjacencies, but is not at the top of the fabric. If a spine node | adjacencies but is not at the top of the fabric. If a spine node | |||
loses all northbound links, the spine node doesn't advertise default | loses all northbound links, the spine node doesn't advertise a | |||
route. But if the level of the spine node is auto-determined by ZTP, | default route. But if the level of the spine node is auto-determined | |||
it will "fall down" as depicted in Figure 8. | by ZTP, it will "fall down" as depicted in Figure 8. | |||
4.3. Use Cases | 4.3. Use Cases | |||
4.3.1. Data Center Topologies | 4.3.1. Data Center Topologies | |||
4.3.1.1. Data Center Fabrics | 4.3.1.1. Data Center Fabrics | |||
RIFT is suited for applying in data center (DC) IP fabrics underlay | RIFT is suited for applying underlay routing in data center (DC) IP | |||
routing, vast majority of which seem to be currently (and for the | fabrics, with the vast majority of these IP fabrics being Clos | |||
foreseeable future) Clos architectures. It significantly simplifies | architectures (and will be for the foreseeable future). It | |||
operation and deployment of such fabrics as described in Section 5 | significantly simplifies operation and deployment of such fabrics as | |||
for environments compared to extensive proprietary provisioning and | described in Section 5 for environments compared to extensive | |||
operational solutions. | proprietary provisioning and operational solutions. | |||
4.3.1.2. Adaptations to Other Proposed Data Center Topologies | 4.3.1.2. Adaptations to Other Proposed Data Center Topologies | |||
. +-----+ +-----+ | . +-----+ +-----+ | |||
. | | | | | . | | | | | |||
.+-+ S0 | | S1 | | .+-+ S0 | | S1 | | |||
.| ++---++ ++---++ | .| ++---++ ++---++ | |||
.| | | | | | .| | | | | | |||
.| | +------------+ | | .| | +------------+ | | |||
.| | | +------------+ | | .| | | +------------+ | | |||
.| | | | | | .| | | | | | |||
.| ++-+--+ +--+-++ | .| ++-+--+ +--+-++ | |||
.| | | | | | .| | | | | | |||
skipping to change at page 11, line 29 ¶ | skipping to change at line 483 ¶ | |||
.| | | | | | .| | | | | | |||
.| +-+-+-+ +--+-++ | .| +-+-+-+ +--+-++ | |||
.+-+ | | | | .+-+ | | | | |||
. | L0 | | L1 | | . | L0 | | L1 | | |||
. +-----+ +-----+ | . +-----+ +-----+ | |||
Figure 2: Level Shortcut | Figure 2: Level Shortcut | |||
RIFT is not strictly limited to Clos topologies. The protocol only | RIFT is not strictly limited to Clos topologies. The protocol only | |||
requires a sense of "compass rose directionality" either achieved | requires a sense of "compass rose directionality" either achieved | |||
through configuration or derivation of levels. So, conceptually, | through configuration or derivation of levels. So conceptually, | |||
shortcuts between levels could be included. Figure 2 depicts an | shortcuts between levels could be included. Figure 2 depicts an | |||
example of a shortcut between levels. In this example, sub-optimal | example of a shortcut between levels. In this example, suboptimal | |||
routing will occur when traffic is sent from L0 to L1 via S0's | routing will occur when traffic is sent from L0 to L1 via S0's | |||
default route and back down through A0 or A1. In order to avoid | default route and back down through A0 or A1. In order to avoid | |||
that, only default routes from A0 or A1 are used, all leaves would be | that, only default routes from A0 or A1 are used. All leaves would | |||
required to install each other's routes. | be required to install each other's routes. | |||
While various technical and operational challenges may require the | While various technical and operational challenges may require the | |||
use of such modifications, discussion of those topics are outside the | use of such modifications, discussion of those topics is outside the | |||
scope of this document. | scope of this document. | |||
4.3.2. Metro Networks | 4.3.2. Metro Networks | |||
The demand for bandwidth is increasing steadily, driven primarily by | The demand for bandwidth is increasing steadily, driven primarily by | |||
environments close to content producers (server farms connection via | environments close to content producers (server farms connection via | |||
DC fabrics) but in proximity to content consumers as well. Consumers | DC fabrics) but in proximity to content consumers as well. Consumers | |||
are often clustered in metro areas with their own network | are often clustered in metro areas with their own network | |||
architectures that can benefit from simplified, regular Clos | architectures that can benefit from simplified, regular Clos | |||
structures and hence from RIFT. | structures. Thus, they can also benefit from RIFT. | |||
4.3.3. Building Cabling | 4.3.3. Building Cabling | |||
Commercial edifices are often cabled in topologies that are either | Commercial edifices are often cabled in topologies that are either | |||
Clos or its isomorphic equivalents. The Clos can grow rather high | Clos or its isomorphic equivalents. The Clos can grow rather high | |||
with many levels. That presents a challenge for traditional routing | with many levels. That presents a challenge for classical routing | |||
protocols (except BGP[RFC4271] and by now largely phased-out | protocols (except BGP [RFC4271] and Private Network-Network Interface | |||
PNNI[PNNI]) which do not support an arbitrary number of levels which | (PNNI) [PNNI], which is largely phased-out by now) that do not | |||
RIFT does naturally. Moreover, due to the limited sizes of | support an arbitrary number of levels, which RIFT does naturally. | |||
forwarding tables in network elements of building cabling, the | Moreover, due to the limited sizes of forwarding tables in network | |||
minimum FIB size RIFT maintains under normal conditions is cost- | elements of building cabling, the minimum FIB size RIFT maintains | |||
effective in terms of hardware and operational costs. | under normal conditions is cost-effective in terms of hardware and | |||
operational costs. | ||||
4.3.4. Internal Router Switching Fabrics | 4.3.4. Internal Router Switching Fabrics | |||
It is common in high-speed communications switching and routing | It is common in high-speed communications switching and routing | |||
devices to use switch fabrics which are interconnection networks | devices to use switch fabrics that are interconnection networks | |||
inside the devices connecting the input ports to their output ports. | inside the devices connecting the input ports to their output ports. | |||
For example, crossbar is one of the switch fabric techniques while a | For example, a crossbar is one of the switch fabric techniques, even | |||
crossbar is not feasible due to cost, head-of-line blocking or size | though it is not feasible due to cost, head-of-line blocking, or size | |||
trade-offs. And normally such fabrics are not self-healing or rely | trade-offs. Normally, such fabrics are not self-healing or rely on | |||
on 1:1 or 1+1 protection schemes but it is conceivable to use RIFT to | 1:1 or 1+1 protection schemes, but it is conceivable to use RIFT to | |||
operate Clos fabrics that can deal effectively with interconnections | operate Clos fabrics that can deal effectively with interconnections | |||
or subsystem failures in such module. RIFT is not IP specific and | or subsystem failures in such a module. RIFT is not IP specific and | |||
hence any link addressing connecting internal device subnets is | hence any link addressing connecting internal device subnets is | |||
conceivable. | conceivable. | |||
4.3.5. CloudCO | 4.3.5. CloudCO | |||
The Cloud Central Office (CloudCO) is a new stage of telecom Central | The Cloud Central Office (CloudCO) is a new stage of the telecom | |||
Office. It takes the advantage of Software Defined Networking (SDN) | Central Office. It takes the advantage of Software-Defined | |||
and Network Function Virtualization (NFV) in conjunction with general | Networking (SDN) and Network Function Virtualization (NFV) in | |||
purpose hardware to optimize current networks. The following figure | conjunction with general purpose hardware to optimize current | |||
illustrates this architecture at a high level. It describes a single | networks. The following figure illustrates this architecture at a | |||
instance or macro-node of cloud CO that provides a number of Value | high level. It describes a single instance or macro-node of CloudCO | |||
Added Services (VAS), a Broadband Access Abstraction (BAA), and | that provides a number of value-added services (VASes), a Broadband | |||
virtualized network services. An Access I/O module faces a Cloud CO | Access Abstraction (BAA), and virtualized network services. An | |||
access node, and the Customer Premises Equipments (CPEs) behind it. | Access I/O module faces a CloudCO access node and the Customer | |||
A Network I/O module is facing the core network. The two I/O modules | Premises Equipment (CPE) behind it. A Network I/O module is facing | |||
are interconnected by a leaf and spine fabric [TR-384]. | the core network. The two I/O modules are interconnected by a spine- | |||
and-leaf fabric [TR-384]. | ||||
+---------------------+ +----------------------+ | +---------------------+ +----------------------+ | |||
| Spine | | Spine | | | Spine | | Spine | | |||
| Switch | | Switch | | | Switch | | Switch | | |||
+------+---+------+-+-+ +--+-+-+-+-----+-------+ | +------+---+------+-+-+ +--+-+-+-+-----+-------+ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | | | | +-------------------------------+ | | | | | | | +-------------------------------+ | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | |||
| | | | +-------------------------+ | | | | | | | | +-------------------------+ | | | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | |||
skipping to change at page 13, line 45 ¶ | skipping to change at line 586 ¶ | |||
| |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| || VAS7 || || VAS4 || || vIGMP || ||BAA || | | | || VAS7 || || VAS4 || || vIGMP || ||BAA || | | |||
| |--------| |--------| |----------| |-------| | | | |--------| |--------| |----------| |-------| | | |||
| +--------+ +--------+ +----------+ +-------+ | | | +--------+ +--------+ +----------+ +-------+ | | |||
| | | | | | |||
++-----------+ +---------++ | ++-----------+ +---------++ | |||
|Network I/O | |Access I/O| | |Network I/O | |Access I/O| | |||
+------------+ +----------+ | +------------+ +----------+ | |||
Figure 3: An example of CloudCO architecture | Figure 3: CloudCO Architecture Example | |||
The Spine-Leaf architecture deployed inside CloudCO meets the network | The Spine-Leaf architecture deployed inside CloudCO meets the network | |||
requirements of adaptable, agile, scalable and dynamic. | requirements of being adaptable, agile, scalable, and dynamic. | |||
5. Operational Considerations | 5. Operational Considerations | |||
RIFT presents the features for organizations building and operating | RIFT presents the features for organizations building and operating | |||
IP fabrics to simplify the operation and deployments while achieving | IP fabrics to simplify the operation and deployments while achieving | |||
many desirable properties of a dynamic routing protocol on such a | many desirable properties of a dynamic routing protocol on such a | |||
substrate: | substrate: | |||
* RIFT only floods routing information to the devices that need it. | * RIFT only floods routing information to the devices that need it. | |||
* RIFT allows for Zero Touch Provisioning within the protocol. In | * RIFT allows for ZTP within the protocol. In its most extreme | |||
its most extreme version, RIFT does not rely on any specific | version, RIFT does not rely on any specific addressing and can | |||
addressing and for IP fabric can operate using IPv6 ND [RFC4861] | operate using IPv6 Neighbor Discovery (ND) [RFC4861] only for IP | |||
only. | fabric. | |||
* RIFT has provisions to detect common IP fabric miscabling | * RIFT has provisions to detect common IP fabric miscabling | |||
scenarios. | scenarios. | |||
* RIFT negotiates automatically BFD per link. This allows for IP | * RIFT automatically negotiates Bidirectional Forwarding Detection | |||
and micro-BFD [RFC7130] to replace Link Aggregation Groups (LAGs) | (BFD) per link. This allows for IP and micro-BFD [RFC7130] to | |||
which do hide bandwidth imbalances in case of constituent | replace Link Aggregation Groups (LAGs) that hide bandwidth | |||
failures. Further automatic link validation techniques similar to | imbalances in case of constituent failures. Further automatic | |||
[RFC5357] could be supported as well. | link validation techniques similar to those in [RFC5357] could be | |||
supported as well. | ||||
* RIFT inherently solves many problems associated with the use of | * RIFT inherently solves many problems associated with the use of | |||
traditional routing topologies with dense meshes and high degrees | classical routing topologies with dense meshes and high degrees of | |||
of ECMP by including automatic bandwidth balancing, flood | ECMP by including automatic bandwidth balancing, flood reduction, | |||
reduction and automatic disaggregation on failures while providing | and automatic disaggregation on failures while providing maximum | |||
maximum aggregation of prefixes in default scenarios. ECMP in | aggregation of prefixes in default scenarios. ECMP in RIFT | |||
RIFT eliminates the need for more Loop-Free Alternates procedures. | eliminates the need for more Loop-Free Alternate (LFA) procedures. | |||
* RIFT reduces FIB size towards the bottom of the IP fabric where | * RIFT reduces FIB size towards the bottom of the IP fabric where | |||
most nodes reside and allows with that for cheaper hardware on the | most nodes reside. This allows for cheaper hardware on the edges | |||
edges and introduction of modern IP fabric architectures that | and introduction of modern IP fabric architectures that encompass | |||
encompass e.g. server multi-homing. | server multihoming and other mechanisms. | |||
* RIFT provides valley-free routing and with that is loop free. A | * RIFT provides valley-free routing that is loop free. A valley- | |||
valley-free path allows reversal of direction at most once from a | free path allows for reversal of direction at most once from a | |||
packet heading northbound to southbound while permitting traversal | packet heading northbound to southbound while permitting traversal | |||
of horizontal links in the northbound phase. This allows the use | of horizontal links in the northbound phase. This allows for the | |||
of any such valley-free path in bi-sectional fabric bandwidth | use of any such valley-free path in bisectional fabric bandwidth | |||
between two destinations irrespective of their metrics which can | between two destinations irrespective of their metrics that can be | |||
be used to balance load on the fabric in different ways. Valley- | used to balance load on the fabric in different ways. Valley-free | |||
free routing eliminates the need for any specific micro-loop | routing eliminates the need for any specific micro-loop avoidance | |||
avoidance procedures for RIFT. | procedures for RIFT. | |||
* RIFT includes a key-value distribution mechanism which allows for | * RIFT includes a key-value distribution mechanism that allows for | |||
future applications such as automatic provisioning of basic | future applications such as automatic provisioning of basic | |||
overlay services or automatic key roll-overs over whole fabrics. | overlay services or automatic key rollovers over whole fabrics. | |||
* RIFT is designed for minimum delay in case of prefix mobility on | * RIFT is designed for minimum delay in case of prefix mobility on | |||
the fabric. In conjunction with [RFC8505], RIFT can differentiate | the fabric. In conjunction with [RFC8505], RIFT can differentiate | |||
anycast advertisements from mobility events and retain only the | anycast advertisements from mobility events and retain only the | |||
most recent advertisement in the latter case. | most recent advertisement in the latter case. | |||
* Many further operational and design points collected over many | * Many further operational and design points collected over many | |||
years of routing protocol deployments have been incorporated in | years of routing protocol deployments have been incorporated in | |||
RIFT such as fast flooding rates, protection of information | RIFT such as fast flooding rates, protection of information | |||
lifetimes and operationally recognizable remote ends of links and | lifetimes, and operationally recognizable remote ends of links and | |||
node names. | node names. | |||
5.1. South Reflection | 5.1. South Reflection | |||
South reflection is a mechanism that South Node TIEs are "reflected" | South reflection is a mechanism where South Node TIEs are "reflected" | |||
back up north to allow nodes in same level without east-west links to | back up north to allow nodes in the same level without East-West | |||
"see" each other. | links to "see" each other. | |||
For example, in Figure 4, Spine111\Spine112\Spine121\Spine122 | For example, in Figure 4, Spine111\Spine112\Spine121\Spine122 | |||
reflects Node S-TIEs from ToF21 to ToF22 separately. Respectively, | reflects Node S-TIEs from ToF21 to ToF22 separately. Respectively, | |||
Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs from ToF22 | Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs from ToF22 | |||
to ToF21 separately. So ToF22 and ToF21 see each other's node | to ToF21 separately, so ToF22 and ToF21 see each other's node | |||
information as level 2 nodes. | information as level 2 nodes. | |||
In an equivalent fashion, as the result of the south reflection | In an equivalent fashion, as the result of the south reflection | |||
between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, | between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, | |||
Spine121 and Spine 122 knows each other at level 1. | Spine121 and Spine 122 know each other at level 1. | |||
5.2. Suboptimal Routing on Link Failures | 5.2. Suboptimal Routing on Link Failures | |||
+--------+ +--------+ | +--------+ +--------+ | |||
| ToF21 | | ToF22 | LEVEL 2 | | ToF21 | | ToF22 | LEVEL 2 | |||
++--+-+-++ ++-+--+-++ | ++--+-+-++ ++-+--+-++ | |||
| | | | | | | + | | | | | | | | + | |||
| | | | | | | linkTS8 | | | | | | | | linkTS8 | |||
+------------+ | +-+linkTS3+-+ | | | +-------------+ | +------------+ | +-+linkTS3+-+ | | | +-------------+ | |||
| | | | | | + | | | | | | | | + | | |||
| +---------------------------+ | linkTS7 | | | +---------------------------+ | linkTS7 | | |||
| | | | + + + | | | | | | + + + | | |||
| | | +-------+linkTS4+------------+ | | | | | +-------+linkTS4+------------+ | | |||
skipping to change at page 16, line 31 ¶ | skipping to change at line 697 ¶ | |||
| +-------------+ | + ++XX+linkSL6+---+ + | | +-------------+ | + ++XX+linkSL6+---+ + | |||
| | | | linkSL5 | | linkSL8 | | | | | linkSL5 | | linkSL8 | |||
| +-----------+ | | + +---+linkSL7+-+ | + | | +-----------+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | |||
+-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | |||
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
+-+-----+ +-+-----+ +-----+-+ +-+-----+ | +-+-----+ +-+-----+ +-----+-+ +-+-----+ | |||
+ + + + | + + + + | |||
Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
Figure 4: Suboptimal routing upon link failure use case | Figure 4: Suboptimal Routing Upon Link Failure Use Case | |||
As shown in Figure 4, as the result of the south reflection between | As shown in Figure 4, as the result of the south reflection, Spine121 | |||
Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and | and Spine 122 know each other via Leaf121 or Leaf 122 at level 1. | |||
Spine 122 knows each other at level 1. | ||||
Without disaggregation mechanism, when linkSL6 fails, the packet from | Without disaggregation mechanisms, the packet from leaf121 to | |||
leaf121 to prefix122 will probably go up through linkSL5 to linkTS3 | prefix122 will probably go up through linkSL5 to linkTS3 when linkSL6 | |||
then go down through linkTS4 to linkSL8 to Leaf122 or go up through | fails. Then, the packet will go down through linkTS4 to linkSL8 to | |||
linkSL5 to linkTS6 then go down through linkTS8 and linkSL8 to | Leaf122 or go up through linkSL5 to linkTS6, then go down through | |||
Leaf122 based on pure default route. It's the case of suboptimal | linkTS8 and linkSL8 to Leaf122 based on the pure default route. This | |||
routing or bow-tieing. | is the case of suboptimal routing or bow tying. | |||
With disaggregation mechanism, when linkSL6 fails, Spine122 will | With disaggregation mechanisms, Spine122 will detect the failure | |||
detect the failure according to the reflected node S-TIE from | according to the reflected node S-TIE from Spine121 when linkSL6 | |||
Spine121. Based on the disaggregation algorithm provided by RIFT, | fails. Based on the disaggregation algorithm provided by RIFT, | |||
Spine122 will explicitly advertise prefix122 in Disaggregated Prefix | Spine122 will explicitly advertise prefix122 in Disaggregated Prefix | |||
S-TIE PrefixTIEElement(prefix122, cost 1). The packet from leaf121 | S-TIE PrefixTIEElement(prefix122, cost 1). The packet from leaf121 | |||
to prefix122 will only be sent to linkSL7 following a longest-prefix | to prefix122 will only be sent to linkSL7 following a longest-prefix | |||
match to prefix 122 directly then go down through linkSL8 to Leaf122 | match to prefix 122 directly, then it will go down through linkSL8 to | |||
. | Leaf122. | |||
5.3. Black-Holing on Link Failures | 5.3. Black-Holing on Link Failures | |||
+--------+ +--------+ | +--------+ +--------+ | |||
| ToF 21 | | ToF 22 | LEVEL 2 | | ToF 21 | | ToF 22 | LEVEL 2 | |||
++-+--+-++ ++-+--+-++ | ++-+--+-++ ++-+--+-++ | |||
| | | | | | | + | | | | | | | | + | |||
| | | | | | | linkTS8 | | | | | | | | linkTS8 | |||
+--------------+ | +-+linkTS3+X+ | | | +--------------+ | +--------------+ | +-+linkTS3+X+ | | | +--------------+ | |||
linkTS1 | | | | | + | | linkTS1 | | | | | + | | |||
skipping to change at page 17, line 34 ¶ | skipping to change at line 747 ¶ | |||
+ +---------------+ | + +---+linkSL6+---+ + | + +---------------+ | + +---+linkSL6+---+ + | |||
linkSL1 | | | linkSL5 | | linkSL8 | linkSL1 | | | linkSL5 | | linkSL8 | |||
+ +--+linkSL3+--+ | | + +---+linkSL7+-+ | + | + +--+linkSL3+--+ | | + +---+linkSL7+-+ | + | |||
| | | | | | | | | | | | | | | | | | |||
+-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | |||
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
+-+-----+ +-+-----+ +-----+-+ +-----+-+ | +-+-----+ +-+-----+ +-----+-+ +-----+-+ | |||
+ + + + | + + + + | |||
Prefix111 Prefix112 Prefix121 Prefix122 | Prefix111 Prefix112 Prefix121 Prefix122 | |||
Figure 5: Black-holing upon link failure use case | Figure 5: Black-Holing Upon Link Failure Use Case | |||
This scenario illustrates a case when double link failure occurs and | This scenario illustrates a case where double link failure occurs and | |||
with that black-holing can happen. | black-holing can happen. | |||
Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, | Without disaggregation mechanisms, the packet from leaf111 to | |||
the packet from leaf111 to prefix122 would suffer 50% black-holing | prefix122 would suffer 50% black-holing based on pure default route | |||
based on pure default route. The packet supposed to go up through | when linkTS3 and linkTS4 both fail. The packet is supposed to go up | |||
linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be | through linkSL1 to linkTS1 and then go down through linkTS3 or | |||
dropped. The packet supposed to go up through linkSL3 to linkTS2 | linkTS4 will be dropped. The packet is supposed to go up through | |||
then go down through linkTS3 or linkTS4 will be dropped as well. | linkSL3 to linkTS2, then go down through linkTS3 or linkTS4 will be | |||
It's the case of black-holing. | dropped as well. This is the case of black-holing. | |||
With disaggregation mechanism, when linkTS3 and linkTS4 both fail, | With disaggregation mechanisms, ToF22 will detect the failure | |||
ToF22 will detect the failure according to the reflected node S-TIE | according to the reflected node S-TIE of ToF21 from Spine111\Spine112 | |||
of ToF21 from Spine111\Spine112. Based on the disaggregation | when linkTS3 and linkTS4 both fail. Based on the disaggregation | |||
algorithm provided by RIFT, ToF22 will explicitly originate an S-TIE | algorithm provided by RIFT, ToF22 will explicitly originate an S-TIE | |||
with prefix 121 and prefix 122, that is flooded to spines 111, 112, | with prefix 121 and prefix 122 that is flooded to spines 111, 112, | |||
121 and 122. | 121, and 122. | |||
The packet from leaf111 to prefix122 will not be routed to linkTS1 or | The packet from leaf111 to prefix122 will not be routed to linkTS1 or | |||
linkTS2. The packet from leaf111 to prefix122 will only be routed to | linkTS2. The packet from leaf111 to prefix122 will only be routed to | |||
linkTS5 or linkTS7 following a longest-prefix match to prefix122. | linkTS5 or linkTS7 following a longest-prefix match to prefix122. | |||
5.4. Zero Touch Provisioning (ZTP) | 5.4. Zero Touch Provisioning (ZTP) | |||
RIFT is designed to require a very minimal configuration to simplify | RIFT is designed to require a very minimal configuration to simplify | |||
its operation and avoid human errors; based on that minimal | its operation and avoid human errors; based on that minimal | |||
information, Zero Touch Provisioning (ZTP) auto configures the key | information, ZTP auto configures the key operational parameters of | |||
operational parameters of all the RIFT nodes, including the SystemID | all the RIFT nodes, including the System ID of the node that must be | |||
of the node that must be unique in the RIFT network and the level of | unique in the RIFT network and the level of the node in the Fat Tree, | |||
the node in the Fat Tree, which determines which peers are northwards | which determines which peers are northward "parents" and which are | |||
"parents" and which are southwards "children". | southward "children". | |||
ZTP is always on, but its decisions can be overridden when a network | ZTP is always on, but its decisions can be overridden when a network | |||
administrator prefers to impose its own configuration. In that case, | administrator prefers to impose its own configuration. In that case, | |||
it is the responsibility of the administrator to ensure that the | it is the responsibility of the administrator to ensure that the | |||
configured parameters are correct, in other words that the SystemID | configured parameters are correct, i.e., ensure that the System ID of | |||
of each node is unique, and that the administratively set levels | each node is unique and that the administratively set levels truly | |||
truly reflect the relative position of the nodes in the fabric. It | reflect the relative position of the nodes in the fabric. It is | |||
is recommended to let ZTP configure the network, and when not, it is | recommended to let ZTP configure the network, and when ZTP does not | |||
recommended to configure the level of all the nodes to avoid an | configure the network, it is recommended to configure the level of | |||
undesirable interaction between ZTP and the manual configuration. | all the nodes to avoid an undesirable interaction between ZTP and the | |||
manual configuration. | ||||
ZTP requires that the administrator points out the Top-of-Fabric | ZTP requires that the administrator points out the ToF nodes to set | |||
(ToF) nodes to set the baseline from which the fabric topology is | the baseline from which the fabric topology is derived. The ToF | |||
derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC | nodes are configured with the TOP_OF_FABRIC flag, which are initial | |||
flag which are initial 'seeds' needed for other ZTP nodes to derive | 'seeds' needed for other ZTP nodes to derive their level in the | |||
their level in the topology. ZTP computes the level of each node | topology. ZTP computes the level of each node based on the Highest | |||
based on the Highest Available Level (HAL) of the potential parent(s) | Available Level (HAL) of the potential parent closest to that | |||
nearest that baseline, which represents the superspine. In a | baseline, which represents the superspine. In a fashion, RIFT can be | |||
fashion, RIFT can be seen as a distance-vector protocol that computes | seen as a distance-vector protocol that computes a set of feasible | |||
a set of feasible successors towards the superspine and auto- | successors towards the superspine and autoconfigures the rest of the | |||
configures the rest of the topology. | topology. | |||
The auto configuration mechanism computes a global maximum of levels | The autoconfiguration mechanism computes a global maximum of levels | |||
by diffusion. The derivation of the level of each node happens then | by diffusion. The derivation of the level of each node happens then | |||
based on Link Information Elements (LIEs) received from its neighbors | based on LIEs received from its neighbors, whereas each node (with | |||
whereas each node (with possibly exceptions of configured leaves) | possible exceptions of configured leaves) tries to attach at the | |||
tries to attach at the highest possible point in the fabric. This | highest possible point in the fabric. This guarantees that even if | |||
guarantees that even if the diffusion front reaches a node from | the diffusion front reaches a node from "below" faster than from | |||
"below" faster than from "above", it will greedily abandon already | "above", it will greedily abandon already negotiated levels derived | |||
negotiated level derived from nodes topologically below it and | from nodes topologically below it and properly peer with nodes above. | |||
properly peer with nodes above. | ||||
The achieved equilibrium can be disturbed massively by all nodes with | The achieved equilibrium can be disturbed massively by all nodes with | |||
highest level either leaving or entering the domain (with some finer | the highest level either leaving or entering the domain (with some | |||
distinctions not explained further). It is therefore recommended | finer distinctions not explained further). It is therefore | |||
that each node is multi-homed towards nodes with respective HAL | recommended that each node is multihomed towards nodes with | |||
offerings. Fortunately, this is the natural state of things for the | respective HAL offerings. Fortunately, this is the natural state of | |||
topology variants considered in RIFT. | things for the topology variants considered in RIFT. | |||
A RIFT node may also be configured to confine it to the leaf role | A RIFT node may also be configured to confine it to the leaf role | |||
with the LEAF_ONLY flag. A leaf node can also be configured to | with the LEAF_ONLY flag. A leaf node can also be configured to | |||
support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either | support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In both | |||
case the node cannot be TOP_OF_FABRIC and its level cannot be | cases, the node cannot be TOP_OF_FABRIC and its level cannot be | |||
configured. RIFT will fully determine the node's level after it is | configured. RIFT will fully determine the node's level after it is | |||
attached to the topology and ensure that the node is at the "bottom | attached to the topology and ensure that the node is at the "bottom | |||
of the hierarchy" (southernmost). | of the hierarchy" (southernmost). | |||
5.5. Miscabling | 5.5. Miscabling | |||
5.5.1. Miscabling Examples | 5.5.1. Miscabling Examples | |||
+----------------+ +-----------------+ | +----------------+ +-----------------+ | |||
| ToF21 | +------+ ToF22 | LEVEL 2 | | ToF21 | +------+ ToF22 | LEVEL 2 | |||
skipping to change at page 19, line 42 ¶ | skipping to change at line 853 ¶ | |||
+-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+ | +-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+ | |||
| | | | | | | | | | | | | | | | | | | | |||
| +---------+ | link-M | +---------+ | | | +---------+ | link-M | +---------+ | | |||
| | | | | | | | | | | | | | | | | | | | |||
| +-------+ | | | | +-------+ | | | | +-------+ | | | | +-------+ | | | |||
| | | | | | | | | | | | | | | | | | | | |||
+-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | |||
|Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 | |||
+-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
Figure 6: A single plane miscabling example | Figure 6: A Single-Plane Miscabling Example | |||
Figure 6 shows a single plane miscabling example. It's a perfect Fat | Figure 6 shows a single-plane miscabling example. It's a perfect Fat | |||
Tree fabric except link-M connecting Leaf112 to ToF22. | Tree fabric except for link-M connecting Leaf112 to ToF22. | |||
The RIFT control protocol can discover the physical links | The RIFT control protocol can discover the physical links | |||
automatically and be able to detect cabling that violates Fat Tree | automatically and is able to detect cabling that violates Fat Tree | |||
topology constraints. It reacts accordingly to such miscabling | topology constraints. It reacts accordingly to such miscabling | |||
attempts, at a minimum preventing adjacencies between nodes from | attempts, preventing adjacencies between nodes from being formed and | |||
being formed and traffic from being forwarded on those miscabled | traffic from being forwarded on those miscabled links at a minimum. | |||
links. Leaf112 will in such scenario use link-M to derive its level | In such scenario, Leaf112 will use link-M to derive its level (unless | |||
(unless it is leaf) and can report links to Spine111 and Spine112 as | it is leaf) and can report links to Spine111 and Spine112 as | |||
miscabled unless the implementations allows horizontal links. | miscabled unless the implementations allow horizontal links. | |||
Figure 7 shows a multiple plane miscabling example. Since Leaf112 | Figure 7 shows a multi-plane miscabling example. Since Leaf112 and | |||
and Spine121 belong to two different PoDs, the adjacency between | Spine121 belong to two different PoDs, the adjacency between Leaf112 | |||
Leaf112 and Spine121 can not be formed. Link-W would be detected and | and Spine121 cannot be formed. Link-W would be detected and | |||
prevented. | prevented. | |||
+-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
|ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2 | |ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2 | |||
+-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| | | | | | | | | | | | | | | | | | |||
| | | +-----------------+ | | | | | | | +-----------------+ | | | | |||
| +--------------------------+ | | | | | | +--------------------------+ | | | | | |||
| +------+ | | | +------+ | | | +------+ | | | +------+ | | |||
| | +-----------------+ | | | | | | | | +-----------------+ | | | | | | |||
skipping to change at page 20, line 36 ¶ | skipping to change at line 895 ¶ | |||
| | | | | | | | | | | | | | | | | | | | |||
| +---------+ | | | +---------+ | | | +---------+ | | | +---------+ | | |||
| | | | link-W | | | | | | | | | link-W | | | | | |||
| +-------+ | | | | +-------+ | | | | +-------+ | | | | +-------+ | | | |||
| | | | | | | | | | | | | | | | | | | | |||
+-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ | |||
|Leaf111| |Leaf112+------+ |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112+------+ |Leaf121| |Leaf122| LEVEL 0 | |||
+-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
+--------PoD#1----------+ +---------PoD#2---------+ | +--------PoD#1----------+ +---------PoD#2---------+ | |||
Figure 7: A multiple plane miscabling example | Figure 7: A Multiple Plane Miscabling Example | |||
RIFT provides an optional level determination procedure in its Zero | RIFT provides an optional level determination procedure in its ZTP | |||
Touch Provisioning mode. Nodes in the fabric without their level | mode. Nodes in the fabric without their level configured determine | |||
configured determine it automatically. This can have possibly | it automatically. However, this can have possible counter-intuitive | |||
counter-intuitive consequences however. One extreme failure scenario | consequences. One extreme failure scenario is depicted in Figure 8, | |||
is depicted in Figure 8 and it shows that if all northbound links of | and it shows that if all northbound links of Spine11 fail at the same | |||
spine11 fail at the same time, spine11 negotiates a lower level than | time, Spine11 negotiates a lower level than Leaf11 and Leaf12. | |||
Leaf11 and Leaf12. | ||||
To prevent such scenario where leafs are expected to act as switches, | To prevent such scenario where leaves are expected to act as | |||
LEAF_ONLY flag can be set for Leaf111 and Leaf112. Since level -1 is | switches, the LEAF_ONLY flag can be set for Leaf111 and Leaf112. | |||
invalid, Spine11 would not derive a valid level from the topology in | Since level -1 is invalid, Spine11 would not derive a valid level | |||
Figure 8. It will be isolated from the whole fabric and it would be | from the topology in Figure 8. It will be isolated from the whole | |||
up to the leafs to declare the links towards such spine as miscabled. | fabric, and it would be up to the leaves to declare the links towards | |||
such spine as miscabled. | ||||
+-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
|ToF A1| |ToF A2| |ToF A1| |ToF A2| | |ToF A1| |ToF A2| |ToF A1| |ToF A2| | |||
+-------+ +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ +-------+ | |||
| | | | | | | | | | | | | | |||
| +-------+ | | | | | +-------+ | | | | |||
+ + | | ====> | | | + + | | ====> | | | |||
X X +------+ | +------+ | | X X +------+ | +------+ | | |||
+ + | | | | | + + | | | | | |||
+----+--+ +-+-----+ +-+-----+ | +----+--+ +-+-----+ +-+-----+ | |||
skipping to change at page 21, line 30 ¶ | skipping to change at line 936 ¶ | |||
+-+---+-+ +--+--+-+ +-----+-+ +-----+-+ | +-+---+-+ +--+--+-+ +-----+-+ +-----+-+ | |||
|Leaf111| |Leaf112| |Leaf111| |Leaf112| | |Leaf111| |Leaf112| |Leaf111| |Leaf112| | |||
+-------+ +-------+ +-+-----+ +-+-----+ | +-------+ +-------+ +-+-----+ +-+-----+ | |||
| | | | | | |||
| +--------+ | | +--------+ | |||
| | | | | | |||
+-+---+-+ | +-+---+-+ | |||
|Spine11| | |Spine11| | |||
+-------+ | +-------+ | |||
Figure 8: Fallen spine | Figure 8: Fallen Spine | |||
5.5.2. Miscabling considerations | 5.5.2. Miscabling Considerations | |||
There are scenarios where operators may want to leverage ZTP and | There are scenarios where operators may want to leverage ZTP and | |||
implement additional cabling constraints that go beyond the | implement additional cabling constraints that go beyond the | |||
previously described topology violations. Enforcing cabling down to | previously described topology violations. Enforcing cabling down to | |||
specific level, node, and port combinations might make it simpler for | specific level, node, and port combinations might make it simpler for | |||
onsite staff to perform troubleshooting activities or replace optical | onsite staff to perform troubleshooting activities or replace optical | |||
transceivers and/or cabling as the physical layout will be consistent | transceivers and/or cabling as the physical layout will be consistent | |||
across the fabric. This is especially true for densely connected | across the fabric. This is especially true for densely connected | |||
fabrics where it is difficult to physically manipulate those | fabrics where it is difficult to physically manipulate those | |||
components. It is also easy to imagine other models, such as one | components. It is also easy to imagine other models, such as one | |||
where the strict port requirement is relaxed. | where the strict port requirement is relaxed. | |||
Figure 9 illustrates an example where the first port on Leaf1 must | Figure 9 illustrates an example where the first port on Leaf1 must | |||
connect to the first port on Spine1, the second port on Leaf1 must | connect to the first port on Spine1, the second port on Leaf1 must | |||
connect to the first port on Spine2, and so on. Consider a case | connect to the first port on Spine2, and so on. Consider a case | |||
where (Leaf1, Port1) and (Leaf1, Port2) were reversed. RIFT would | where (Leaf1, Port1) and (Leaf1, Port2) were reversed. RIFT would | |||
not consider this to be miscabled by default, however, an operator | not consider this to be miscabled by default; however, an operator | |||
might want to. | might want to. | |||
+--------+ +--------+ +--------+ +--------+ | +--------+ +--------+ +--------+ +--------+ | |||
| Spine1 | | Spine2 | | Spine3 | | Spine4 | | | Spine1 | | Spine2 | | Spine3 | | Spine4 | | |||
+-1------+ +-1------+ +-1------+ +-1------+ | +-1------+ +-1------+ +-1------+ +-1------+ | |||
+ + + + | + + + + | |||
| +----------+ | | | | +----------+ | | | |||
| | | | | | | | | | |||
| | +---------------------+ | | | | +---------------------+ | | |||
| | | | | | | | | | |||
| | | +--------------------------------+ | | | | +--------------------------------+ | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
+ + + + | + + + + | |||
+-1--2--3--4--+ | +-1--2--3--4--+ | |||
| Leaf1 | ...... | | Leaf1 | ...... | |||
+-------------+ | +-------------+ | |||
Figure 9: Fallen spine | Figure 9: Additional Cabling Constraint Example | |||
RIFT allows implementations to provide programmable plugins that can | RIFT allows implementations to provide programmable plug-ins that can | |||
adjust ZTP operation or capture information during computation. | adjust ZTP operation or capture information during computation. | |||
While defining this is outside the scope of this document, such a | While defining this is outside the scope of this document, such a | |||
mechanism could be used to extend miscabling functionality. | mechanism could be used to extend the miscabling functionality. | |||
For other protocols to achieve this, it would require additional | For other protocols to achieve this, it would require additional | |||
operational overhead. Consider a fabric that is using unnumbered | operational overhead. Consider a fabric that is using unnumbered | |||
OSPF links, it is still very likely that a miscabled link will form | OSPF links; it is still very likely that a miscabled link will form | |||
an adjacency. Each attempts to move cables to the correct port may | an adjacency. Each attempt to move cables to the correct port may | |||
result in the need for additional troubleshooting as other links will | result in the need for additional troubleshooting as other links will | |||
become miscabled in the process. Without automation to explicitly | become miscabled in the process. Without automation to explicitly | |||
tell the operator which ports need to be moved where, the process | tell the operator which ports need to be moved where, the process | |||
becomes manually intensive and error-prone very quickly. Or if the | becomes manually intensive and error-prone very quickly. If the | |||
problem goes unnoticed, result in suboptimal performance in the | problem goes unnoticed, it will result in suboptimal performance in | |||
fabric. | the fabric. | |||
5.6. Multicast and Broadcast Implementations | 5.6. Multicast and Broadcast Implementations | |||
RIFT supports both multicast and broadcast implementations. While a | RIFT supports both multicast and broadcast implementations. While a | |||
multicast implementation is preferred, there might cases where a | multicast implementation is preferred, there might cases where a | |||
broadcast implementation is optimal or even required. For example, | broadcast implementation is optimal or even required. For example, | |||
operating systems on IoT devices and embedded devices may not have | operating systems on IoT devices and embedded devices may not have | |||
the required multicast support. Another example is containers, which | the required multicast support. Another example is containers, which | |||
in some cases do support multicast, but tend to be very CPU- | do support multicast in some cases but tend to be very CPU- | |||
inefficient and difficult to tune. | inefficient and difficult to tune. | |||
5.7. Positive vs. Negative Disaggregation | 5.7. Positive vs. Negative Disaggregation | |||
Disaggregation is the procedure whereby RIFT [RIFT] advertises a more | Disaggregation is the procedure whereby RIFT [RFC9692] advertises a | |||
specific route southwards as an exception to the aggregated fabric- | more specific route southwards as an exception to the aggregated | |||
default north. Disaggregation is useful when a prefix within the | fabric-default north. Disaggregation is useful when a prefix within | |||
aggregation is reachable via some of the parents but not the others | the aggregation is reachable via some of the parents but not the | |||
at the same level of the fabric. It is mandatory when the level is | others at the same level of the fabric. It is mandatory when the | |||
the ToF since a ToF node that cannot reach a prefix becomes a black | level is the ToF since a ToF node that cannot reach a prefix becomes | |||
hole for that prefix. The hard problem is to know which prefixes are | a black hole for that prefix. The hard problem is to know which | |||
reachable by whom. | prefixes are reachable by whom. | |||
In the general case, RIFT [RIFT] solves that problem by | In the general case, RIFT [RFC9692] solves that problem by | |||
interconnecting the ToF nodes. So the ToF nodes can exchange the | interconnecting the ToF nodes so that the ToF nodes can exchange the | |||
full list of prefixes that exist in the fabric and figure out when a | full list of prefixes that exist in the fabric and figure out when a | |||
ToF node lacks reachability to some prefixes. This requires | ToF node lacks reachability to some prefixes. This requires | |||
additional ports at the ToF, typically 2 ports per ToF node to form a | additional ports at the ToF, typically two ports per ToF node to form | |||
ToF-spanning ring. RIFT [RIFT] also defines the southbound | a ToF-spanning ring. RIFT [RFC9692] also defines the southbound | |||
reflection procedure that enables a parent to explore the direct | reflection procedure that enables a parent to explore the direct | |||
connectivity of its peers, meaning their own parents and children; | connectivity of its peers, meaning their own parents and children; | |||
based on the advertisements received from the shared parents and | based on the advertisements received from the shared parents and | |||
children, it may enable the parent to infer the prefixes its peers | children, it may enable the parent to infer the prefixes its peers | |||
can reach. | can reach. | |||
When a parent lacks reachability to a prefix, it may disaggregate the | When a parent lacks reachability to a prefix, it may disaggregate the | |||
prefix negatively, i.e., advertise that this parent can be used to | prefix negatively, i.e., advertise that this parent can be used to | |||
reach any prefix in the aggregation except that one. The Negative | reach any prefix in the aggregation except that one. The Negative | |||
Disaggregation signaling is simple and functions transitively from | Disaggregation signaling is simple and functions transitively from | |||
ToF to top-of-pod (ToP) and then from ToP to Leaf. But it is hard | ToF to Top-of-Pod (ToP) and then from ToP to Leaf. However, it is | |||
for a parent to figure which prefix it needs to disaggregate, because | hard for a parent to figure out which prefix it needs to disaggregate | |||
it does not know what it does not know; it results that the use of a | because it does not know what it does not know; it results that the | |||
spanning ring at the ToF is required to operate the Negative | use of a spanning ring at the ToF is required to operate the Negative | |||
Disaggregation. Also, though it is only an implementation problem, | Disaggregation. Also, though it is only an implementation problem, | |||
the programming of the FIB is complex compared to normal routes, and | the programming of the FIB is complex compared to normal routes and | |||
may incur recursions. | may incur recursions. | |||
The more classical alternative is, for the parents that can reach a | The more classical alternative is, for the parents that can reach a | |||
prefix that peers at the same level cannot, to advertise a more | prefix that peers at the same level cannot, to advertise a more | |||
specific route to that prefix. This leverages the normal longest | specific route to that prefix. This leverages the normal longest | |||
prefix match in the FIB, and does not require a special | prefix match in the FIB and does not require a special | |||
implementation. But as opposed to the Negative Disaggregation, the | implementation. As opposed to the Negative Disaggregation, the | |||
Positive Disaggregation is difficult and inefficient to operate | Positive Disaggregation is difficult and inefficient to operate | |||
transitively. | transitively. | |||
Transitivity is not needed to a grandchild if all its parents | Transitivity is not needed by a grandchild if all its parents | |||
received the Positive Disaggregation, meaning that they shall all | received the Positive Disaggregation, meaning that they shall all | |||
avoid the black hole; when that is the case, they collectively build | avoid the black hole; when that is the case, they collectively build | |||
a ceiling that protects the grandchild. But until then, a parent | a ceiling that protects the grandchild. Until then, a parent that | |||
that received a Positive Disaggregation may believe that some peers | received the Positive Disaggregation may believe that some peers are | |||
are lacking the reachability and readvertise too early, or defer and | lacking the reachability and re-advertise too early or defer and | |||
maintain a black hole situation longer than necessary. | maintain a black hole situation longer than necessary. | |||
In a non-partitioned fabric, all the ToF nodes see one another | In a non-partitioned fabric, all the ToF nodes see one another | |||
through the reflection and can figure if one is missing a child. In | through the reflection and can figure out if one is missing a child. | |||
that case it is possible to compute the prefixes that the peer cannot | In that case, it is possible to compute the prefixes that the peer | |||
reach and disaggregate positively without a ToF-spanning ring. The | cannot reach and disaggregate positively without a ToF-spanning ring. | |||
ToF nodes can also ascertain that the ToP nodes are connected each to | The ToF nodes can also ascertain that the ToP nodes are each | |||
at least a ToF node that can still reach the prefix, meaning that the | connected to at least a ToF node that can still reach the prefix, | |||
transitive operation is not required. | meaning that the transitive operation is not required. | |||
The bottom line is that in a fabric that is partitioned (e.g., using | The bottom line is that in a fabric that is partitioned (e.g., using | |||
multiple planes) and/or where the ToP nodes are not guaranteed to | multiple planes) and/or where the ToP nodes are not guaranteed to | |||
always form a ceiling for their children, it is mandatory to use the | always form a ceiling for their children, it is mandatory to use | |||
Negative Disaggregation. On the other hand, in a highly symmetrical | Negative Disaggregation. On the other hand, in a highly symmetrical | |||
and fully connected fabric, (e.g., a canonical Clos Network), the | and fully connected fabric (e.g., a canonical Clos Network), the | |||
Positive Disaggregation methods allows to save the complexity and | Positive Disaggregation methods save the complexity and cost | |||
cost associated to the ToF-spanning ring. | associated to the ToF-spanning ring. | |||
Note that in the case of Positive Disaggregation, the first ToF | Note that in the case of Positive Disaggregation, the first ToF nodes | |||
node(s) that announces a more-specific route attracts all the traffic | that announce a more-specific route attract all the traffic for that | |||
for that route and may suffer from a transient incast. A ToP node | route and may suffer from a transient incast. A ToP node that defers | |||
that defers injecting the longer prefix in the FIB, in order to | injecting the longer prefix in the FIB, in order to receive more | |||
receive more advertisements and spread the packets better, also keeps | advertisements and spread the packets better, also keeps on sending a | |||
on sending a portion of the traffic to the black hole in the | portion of the traffic to the black hole in the meantime. In the | |||
meantime. In the case of Negative Disaggregation, the last ToF | case of Negative Disaggregation, the last ToF nodes that inject the | |||
node(s) that injects the route may also incur an incast issue; this | route may also incur an incast issue; this problem would occur if a | |||
problem would occur if a prefix that becomes totally unreachable is | prefix that becomes totally unreachable is disaggregated. | |||
disaggregated. | ||||
5.8. Mobile Edge and Anycast | 5.8. Mobile Edge and Anycast | |||
When a physical or a virtual node changes its point of attachment in | When a physical or a virtual node changes its point of attachment in | |||
the fabric from a previous-leaf to a next-leaf, new routes must be | the fabric from a previous-leaf to a next-leaf, new routes must be | |||
installed that supersede the old ones. Since the flooding flows | installed that supersede the old ones. Since the flooding flows | |||
northwards, the nodes (if any) between the previous-leaf and the | northwards, the nodes (if any) between the previous-leaf and the | |||
common parent are not immediately aware that the path via previous- | common parent are not immediately aware that the path via the | |||
leaf is obsolete, and a stale route may exist for a while. The | previous-leaf is obsolete and a stale route may exist for a while. | |||
common parent needs to select the freshest route advertisement in | The common parent needs to select the freshest route advertisement in | |||
order to install the correct route via the next-leaf. This requires | order to install the correct route via the next-leaf. This requires | |||
that the fabric determines the sequence of the movements of the | that the fabric determines the sequence of the movements of the | |||
mobile node. | mobile node. | |||
On the one hand, a classical sequence counter provides a total order | On the one hand, a classical sequence counter provides a total order | |||
for a while but it will eventually wrap. On the other hand, a | for a while, but it will eventually wrap. On the other hand, a | |||
timestamp provides a permanent order but it may miss a movement that | timestamp provides a permanent order, but it may miss a movement that | |||
happens too quickly vs. the granularity of the timing information. | happens too quickly vs. the granularity of the timing information. | |||
It is not envisioned that an average fabric supports Precision Time | It is not envisioned that an average fabric supports the Precision | |||
Protocol [IEEEstd1588] in the short term, nor that the precision | Time Protocol [IEEEstd1588] in the short term nor that the precision | |||
available with the Network Time Protocol [RFC5905] (in the order of | available with the Network Time Protocol [RFC5905] (in the order of | |||
100 to 200ms) may not be necessarily enough to cover, e.g., the fast | 100 to 200 ms) may not be necessarily enough to cover, e.g., the fast | |||
mobility of a Virtual Machine. | mobility of a Virtual Machine (VM). | |||
Section 6.8.4 "Mobility" of RIFT [RIFT] specifies a hybrid method | Section 6.8.4 ("Mobility") of [RFC9692] specifies a hybrid method | |||
that combines a sequence counter from the mobile node and a timestamp | that combines a sequence counter from the mobile node and a timestamp | |||
from the network taken at the leaf when the route is injected. If | from the network taken at the leaf when the route is injected. If | |||
the timestamps of the concurrent advertisements are comparable (i.e., | the timestamps of the concurrent advertisements are comparable (i.e., | |||
more distant than the precision of the timing protocol), then the | more distant than the precision of the timing protocol), then the | |||
timestamp alone is used to determine the relative freshness of the | timestamp alone is used to determine the relative freshness of the | |||
routes. Otherwise, the sequence counter from the mobile node, if | routes. Otherwise, the sequence counter from the mobile node is used | |||
available, is used. One caveat is that the sequence counter must not | if it is available. One caveat is that the sequence counter must not | |||
wrap within the precision of the timing protocol. Another is that | wrap within the precision of the timing protocol. Another is that | |||
the mobile node may not even provide a sequence counter, in which | the mobile node may not even provide a sequence counter; in which | |||
case the mobility itself must be slower than the precision of the | case, the mobility itself must be slower than the precision of the | |||
timing. | timing. | |||
Mobility must not be confused with anycast. In both cases, a same | Mobility must not be confused with anycast. In both cases, the same | |||
address is injected in RIFT at different leaves. In the case of | address is injected in RIFT at different leaves. In the case of | |||
mobility, only the freshest route must be conserved, since mobile | mobility, only the freshest route must be conserved since the mobile | |||
node changed its point of attachment for a leaf to the next. In the | node changes its point of attachment for a leaf to the next. In the | |||
case of anycast, the node may be either multihomed (attached to | case of anycast, the node may either be multihomed (attached to | |||
multiple leaves in parallel) or reachable beyond the fabric via | multiple leaves in parallel) or reachable beyond the fabric via | |||
multiple routes that are redistributed to different leaves; either | multiple routes that are redistributed to different leaves. Either | |||
way, in the case of anycast, the multiple routes are equally valid | way, the multiple routes are equally valid and should be conserved in | |||
and should be conserved. Without further information from the | the case of anycast. Without further information from the | |||
redistributed routing protocol, it is impossible to sort out a | redistributed routing protocol, it is impossible to sort out a | |||
movement from a redistribution that happens asynchronously on | movement from a redistribution that happens asynchronously on | |||
different leaves. RIFT [RIFT] expects that anycast addresses are | different leaves. RIFT [RFC9692] expects that anycast addresses are | |||
advertised within the timing precision, which is typically the case | advertised within the timing precision, which is typically the case | |||
with a low-precision timing and a multihomed node. Beyond that time | with a low-precision timing and a multihomed node. Beyond that time | |||
interval, RIFT interprets the lag as a mobility and only the freshest | interval, RIFT interprets the lag as a mobility and only the freshest | |||
route is retained. | route is retained. | |||
When using IPv6 [RFC8200], RIFT suggests to leverage [RFC8505] as the | When using IPv6 [RFC8200], RIFT suggests leveraging 6LoWPAN ND | |||
IPv6 ND interaction between the mobile node and the leaf. This | [RFC8505] as the IPv6 ND interaction between the mobile node and the | |||
provides not only a sequence counter but also a lifetime and a | leaf. This not only provides a sequence counter but also a lifetime | |||
security token that may be used to protect the ownership of an | and a security token that may be used to protect the ownership of an | |||
address [RFC8928]. When using [RFC8505], the parallel registration | address [RFC8928]. When using 6LoWPAN ND [RFC8505], the parallel | |||
of an anycast address to multiple leaves is done with the same | registration of an anycast address to multiple leaves is done with | |||
sequence counter, whereas the sequence counter is incremented when | the same sequence counter, whereas the sequence counter is | |||
the point of attachment changes. This way, it is possible to | incremented when the point of attachment changes. This way, it is | |||
differentiate a mobile node from a multihomed node, even when the | possible to differentiate a mobile node from a multihomed node, even | |||
mobility happens within the timing precision. It is also possible | when the mobility happens within the timing precision. It is also | |||
for a mobile node to be multihomed as well, e.g., to change only one | possible for a mobile node to be multihomed as well, e.g., to change | |||
of its points of attachment. | only one of its points of attachment. | |||
5.9. IPv4 over IPv6 | 5.9. IPv4 over IPv6 | |||
RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6 | RIFT allows advertising IPv4 prefixes over an IPv6 RIFT network. An | |||
Address Family (AF) configures via the usual Neighbor Discovery (ND) | IPv6 Address Family (AF) configures via the usual ND mechanisms and | |||
mechanisms and then V4 can use V6 next-hops analogous to [RFC8950]. | then V4 can use V6 next-hops analogous to [RFC8950]. It is expected | |||
It is expected that the whole fabric supports the same type of | that the whole fabric supports the same type of forwarding of AFs on | |||
forwarding of address families on all the links. RIFT provides an | all the links. RIFT provides an indication whether a node is capable | |||
indication whether a node is v4 forwarding capable and | of V4-forwarding and implementations are possible where different | |||
implementations are possible where different routing tables are | routing tables are computed per AF as long as the computation remains | |||
computed per address family as long as the computation remains loop- | loop-free. | |||
free. | ||||
+-----+ +-----+ | +-----+ +-----+ | |||
+---+---+ | ToF | | ToF | | +---+---+ | ToF | | ToF | | |||
^ +--+--+ +-----+ | ^ +--+--+ +-----+ | |||
| | | | | | | | | | | | |||
| | +-------------+ | | | | +-------------+ | | |||
| | +--------+ | | | | | +--------+ | | | |||
+ | | | | | + | | | | | |||
V6 +-----+ +-+---+ | V6 +-----+ +-+---+ | |||
Forwarding |Spine| |Spine| | Forwarding |Spine| |Spine| | |||
+ +--+--+ +-----+ | + +--+--+ +-----+ | |||
| | | | | | | | | | | | |||
| | +-------------+ | | | | +-------------+ | | |||
| | +--------+ | | | | | +--------+ | | | |||
| | | | | | | | | | | | |||
v +-----+ +-+---+ | v +-----+ +-+---+ | |||
+---+---+ |Leaf | | Leaf| | +---+---+ |Leaf | | Leaf| | |||
+--+--+ +--+--+ | +--+--+ +--+--+ | |||
| | | | | | |||
IPv4 prefixes| |IPv4 prefixes | IPv4 prefixes| |IPv4 prefixes | |||
| | | | | | |||
+---+----+ +---+----+ | +---+----+ +---+----+ | |||
| V4 | | V4 | | | V4 | | V4 | | |||
| subnet | | subnet | | | subnet | | subnet | | |||
+--------+ +--------+ | +--------+ +--------+ | |||
Figure 10: IPv4 over IPv6 | Figure 10: IPv4 over IPv6 | |||
5.10. In-Band Reachability of Nodes | 5.10. In-Band Reachability of Nodes | |||
RIFT doesn't precondition that nodes of the fabric have reachable | RIFT doesn't precondition that nodes of the fabric have reachable | |||
addresses. But the operational reasons to reach the internal nodes | addresses, but the operational reasons to reach the internal nodes | |||
may exist. Figure 11 shows an example that the network management | may exist. Figure 11 shows an example that the network management | |||
station (NMS) attaches to leaf1. | station (NMS) attaches to Leaf1. | |||
+-------+ +-------+ | +-------+ +-------+ | |||
| ToF1 | | ToF2 | | | ToF1 | | ToF2 | | |||
++---- ++ ++-----++ | ++---- ++ ++-----++ | |||
| | | | | | | | | | |||
| +----------+ | | | +----------+ | | |||
| +--------+ | | | | +--------+ | | | |||
| | | | | | | | | | |||
++-----++ +--+---++ | ++-----++ +--+---++ | |||
|Spine1 | |Spine2 | | |Spine1 | |Spine2 | | |||
skipping to change at page 27, line 32 ¶ | skipping to change at line 1212 ¶ | |||
| | | | | | | | | | |||
| +----------+ | | | +----------+ | | |||
| +--------+ | | | | +--------+ | | | |||
| | | | | | | | | | |||
++-----++ +--+---++ | ++-----++ +--+---++ | |||
| Leaf1 | | Leaf2 | | | Leaf1 | | Leaf2 | | |||
+---+---+ +-------+ | +---+---+ +-------+ | |||
| | | | |||
|NMS | |NMS | |||
Figure 11: In-Band reachability of node | Figure 11: In-Band Reachability of Nodes | |||
If NMS wants to access Leaf2, it simply works. Because loopback | If the NMS wants to access Leaf2, it simply works because the | |||
address of Leaf2 is flooded in its Prefix North TIE. | loopback address of Leaf2 is flooded in its Prefix North TIE. | |||
If NMS wants to access Spine2, it simply works too. Because spine | If the NMS wants to access Spine2, it also works because a spine node | |||
node always advertises its loopback address in the Prefix North TIE. | always advertises its loopback address in the Prefix North TIE. The | |||
NMS may reach Spine2 from Leaf1-Spine2 or Leaf1-Spine1-ToF1/ | NMS may reach Spine2 from Leaf1-Spine2 or Leaf1-Spine1-ToF1/ | |||
ToF2-Spine2. | ToF2-Spine2. | |||
If NMS wants to access ToF2, ToF2's loopback address needs to be | If the NMS wants to access ToF2, ToF2's loopback address needs to be | |||
injected into its Prefix South TIE. This TIE must be seen by all | injected into its Prefix South TIE. This TIE must be seen by all | |||
nodes at the level below - the spine nodes in Figure 11 – that must | nodes at the level below -- the spine nodes in Figure 11 -- that must | |||
form a ceiling for all the traffic coming from below (south). | form a ceiling for all the traffic coming from below (south). | |||
Otherwise, the traffic from NMS may follow the default route to the | Otherwise, the traffic from the NMS may follow the default route to | |||
wrong ToF Node, e.g., ToF1. | the wrong ToF Node, e.g., ToF1. | |||
In case of failure between ToF2 and spine nodes, ToF2's loopback | In the case of failure between ToF2 and spine nodes, ToF2's loopback | |||
address must be disaggregated recursively all the way to the leaves. | address must be disaggregated recursively all the way to the leaves. | |||
In a partitioned ToF, even with recursive disaggregation a ToF node | In a partitioned ToF, even with recursive disaggregation, a ToF node | |||
is only reachable within its plane. | is only reachable within its plane. | |||
A possible alternative to recursive disaggregation is to use a ring | A possible alternative to recursive disaggregation is to use a ring | |||
that interconnects the ToF nodes to transmit packets between them for | that interconnects the ToF nodes to transmit packets between them for | |||
their loopback addresses only. The idea is that this is mostly | their loopback addresses only. The idea is that this is mostly | |||
control traffic and should not alter the load balancing properties of | control traffic and should not alter the load-balancing properties of | |||
the fabric. | the fabric. | |||
5.11. Dual Homing Servers | 5.11. Dual-Homing Servers | |||
Each RIFT node may operate in Zero Touch Provisioning (ZTP) mode. It | Each RIFT node may operate in ZTP mode. It has no configuration | |||
has no configuration (unless it is a Top-of-Fabric at the top of the | (unless it is a ToF node at the top of the topology or if it must | |||
topology or the must operate in the topology as leaf and/or support | operate in the topology as a leaf and/or support leaf-2-leaf | |||
leaf-2-leaf procedures) and it will fully configure itself after | procedures), and it will fully configure itself after being attached | |||
being attached to the topology. | to the topology. | |||
+---+ +---+ +---+ | +---+ +---+ +---+ | |||
|ToF| |ToF| |ToF| ToF | |ToF| |ToF| |ToF| ToF | |||
+---+ +---+ +---+ | +---+ +---+ +---+ | |||
| | | | | | | | | | | | | | |||
| +----------------+ | | | | +----------------+ | | | |||
| +----------------+ | | | +----------------+ | | |||
| | | | | | | | | | | | | | |||
+----------+--+ +--+----------+ | +----------+--+ +--+----------+ | |||
| ToR1 | | ToR2 | Spine | | ToR1 | | ToR2 | Spine | |||
skipping to change at page 28, line 40 ¶ | skipping to change at line 1269 ¶ | |||
| +-----------------+ | | | | | +-----------------+ | | | | |||
| | | +-------------+ | | | | | | +-------------+ | | | |||
| | | | | +-----------------+ | | | | | | | +-----------------+ | | |||
| | | | +--------------+ | | | | | | | | +--------------+ | | | | |||
| | | | | | | | | | | | | | | | | | |||
+---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ | |||
| | | | | | | | | | | | | | | | | | |||
+---+ +---+ ............. +---+ +---+ | +---+ +---+ ............. +---+ +---+ | |||
SV(1) SV(2) SV(n-1) SV(n) Leaf | SV(1) SV(2) SV(n-1) SV(n) Leaf | |||
Figure 12: Dual-homing servers | Figure 12: Dual-Homing Servers | |||
Sometimes, people may prefer to disaggregate from ToR to servers from | Sometimes people may prefer to disaggregate from ToR nodes to servers | |||
start on, i.e. the servers have couple tens of routes in FIB from | from startup, i.e., the servers have multiple routes in the FIB from | |||
start on beside default routes to avoid breakages at rack level. | startup other than default routes to avoid breakages at the rack | |||
Full disaggregation of the fabric could be achieved by configuration | level. Full disaggregation of the fabric could be achieved by | |||
supported by RIFT. | configuration supported by RIFT. | |||
5.12. Fabric with A Controller | 5.12. Fabric with a Controller | |||
There are many different ways to deploy the controller. One | There are many different ways to deploy the controller. One | |||
possibility is attaching a controller to the RIFT domain from ToF and | possibility is attaching a controller to the RIFT domain from ToF and | |||
another possibility is attaching a controller from the leaf. | another possibility is attaching a controller from the leaf. | |||
+------------+ | +------------+ | |||
| Controller | | | Controller | | |||
++----------++ | ++----------++ | |||
| | | | | | |||
| | | | | | |||
skipping to change at page 29, line 28 ¶ | skipping to change at line 1306 ¶ | |||
RIFT domain |Spine| |Spine| | RIFT domain |Spine| |Spine| | |||
+--+--+ +-----+ | +--+--+ +-----+ | |||
| | | | | | | | | | | | |||
| | +-------------+ | | | | +-------------+ | | |||
| | +--------+ | | | | | +--------+ | | | |||
| | | | | | | | | | | | |||
| +-----+ +-+---+ | | +-----+ +-+---+ | |||
------- |Leaf | | Leaf| | ------- |Leaf | | Leaf| | |||
+-----+ +-----+ | +-----+ +-----+ | |||
Figure 13: Fabric with a controller | Figure 13: Fabric with a Controller | |||
5.12.1. Controller Attached to ToFs | 5.12.1. Controller Attached to ToFs | |||
If a controller is attaching to the RIFT domain from ToF, it usually | If a controller is attaching to the RIFT domain from ToF, it usually | |||
uses dual-homing connections. The loopback prefix of the controller | uses dual-homing connections. The loopback prefix of the controller | |||
should be advertised down by the ToF and spine to leaves. If the | should be advertised down by the ToF and spine to the leaves. If the | |||
controller loses link to ToF, make sure the ToF withdraw the prefix | controller loses the link to ToF, make sure the ToF withdraws the | |||
of the controller. | prefix of the controller. | |||
5.12.2. Controller Attached to Leaf | 5.12.2. Controller Attached to Leaf | |||
If the controller is attaching from a leaf to the fabric, no special | If the controller is attaching from a leaf to the fabric, no special | |||
provisions are needed. | provisions are needed. | |||
5.13. Internet Connectivity Within Underlay | 5.13. Internet Connectivity Within Underlay | |||
If global addressing is running without overlay, an external default | If global addressing is running without overlay, an external default | |||
route needs to be advertised through RIFT fabric to achieve internet | route needs to be advertised through the RIFT fabric to achieve | |||
connectivity. For the purpose of forwarding of the entire RIFT | internet connectivity. For the purpose of forwarding of the entire | |||
fabric, an internal fabric prefix needs to be advertised in the South | RIFT fabric, an internal fabric prefix needs to be advertised in the | |||
Prefix TIE by ToF and spine nodes. | Prefix South TIE by ToF and spine nodes. | |||
5.13.1. Internet Default on the Leaf | 5.13.1. Internet Default on the Leaf | |||
In case that the internet gateway is a leaf, the leaf node as the | In the case that the internet gateway is a leaf, the leaf node as the | |||
internet gateway needs to advertise a default route in its Prefix | internet gateway needs to advertise a default route in its Prefix | |||
North TIE. | North TIE. | |||
5.13.2. Internet Default on the ToFs | 5.13.2. Internet Default on the ToFs | |||
In case that the internet gateway is a ToF, the ToF and spine nodes | In the case that the internet gateway is a ToF, the ToF and spine | |||
need to advertise a default route in the Prefix South TIE. | nodes need to advertise a default route in the Prefix South TIE. | |||
5.14. Subnet Mismatch and Address Families | 5.14. Subnet Mismatch and Address Families | |||
+--------+ +--------+ | +--------+ +--------+ | |||
| | LIE LIE | | | | | LIE LIE | | | |||
| A | +----> <----+ | B | | | A | +----> <----+ | B | | |||
| +---------------------+ | | | +---------------------+ | | |||
+--------+ +--------+ | +--------+ +--------+ | |||
X/24 Y/24 | X/24 Y/24 | |||
Figure 14: subnet mismatch | Figure 14: Subnet Mismatch | |||
LIEs are exchanged over all links running RIFT to perform Link | LIEs are exchanged over all links running RIFT to perform Link | |||
(Neighbor) Discovery. A node must NOT originate LIEs on an address | (Neighbor) Discovery. A node must NOT originate LIEs on an AF if it | |||
family if it does not process received LIEs on that family. LIEs on | does not process received LIEs on that family. LIEs on the same link | |||
same link are considered part of the same negotiation independent on | are considered part of the same negotiation independent from the AF | |||
the address family they arrive on. An implementation must be ready | they arrive on. An implementation must be ready to accept TIEs on | |||
to accept TIEs on all addresses it used as source of LIE frames. | all addresses it used as the source of LIE frames. | |||
As shown in the above figure, without further checks adjacency of | As shown in Figure 14, an adjacency of nodes A and B may form without | |||
node A and B may form, but the forwarding between node A and node B | further checks, but the forwarding between nodes A and B may fail | |||
may fail because subnet X mismatches with subnet Y. | because subnet X mismatches with subnet Y. | |||
To prevent this a RIFT implementation should check for subnet | To prevent this, a RIFT implementation should check for subnet | |||
mismatch just like e.g. IS-IS does. This can lead to scenarios | mismatch in a way that is similar to how IS-IS does. This can lead | |||
where an adjacency, despite exchange of LIEs in both address families | to scenarios where an adjacency, despite the exchange of LIEs in both | |||
may end up having an adjacency in a single AF only. This is a | AFs, may end up having an adjacency in a single AF only. This is | |||
consideration especially in Section 5.9 scenarios. | especially a consideration in scenarios relating to Section 5.9. | |||
5.15. Anycast Considerations | 5.15. Anycast Considerations | |||
+ traffic | + traffic | |||
| | | | |||
v | v | |||
+------+------+ | +------+------+ | |||
| ToF | | | ToF | | |||
+---+-----+---+ | +---+-----+---+ | |||
| | | | | | | | | | |||
+------------+ | | +------------+ | +------------+ | | +------------+ | |||
| | | | | | | | | | |||
+---+---+ +-------+ +-------+ +---+---+ | +---+---+ +-------+ +-------+ +---+---+ | |||
skipping to change at page 31, line 32 ¶ | skipping to change at line 1398 ¶ | |||
| | | | | | | | | | | | | | | | | | |||
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 | |||
+-+-----+ ++------+ +-----+-+ +-----+-+ | +-+-----+ ++------+ +-----+-+ +-----+-+ | |||
+ + + ^ + | + + + ^ + | |||
PrefixA PrefixB PrefixA | PrefixC | PrefixA PrefixB PrefixA | PrefixC | |||
| | | | |||
+ traffic | + traffic | |||
Figure 15: Anycast | Figure 15: Anycast | |||
If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast | If the traffic comes from ToF to Leaf111 or Leaf121, which has | |||
prefix PrefixA, RIFT can deal with this case well. But if the | anycast prefix PrefixA, RIFT can deal with this case well. However, | |||
traffic comes from Leaf122, it arrives Spine21 or Spine22 at level 1. | if the traffic comes from Leaf122, it arrives to Spine21 or Spine22 | |||
But Spine21 or Spine22 doesn't know another PrefixA attaching | at LEVEL 1. Additionally, Spine21 or Spine22 doesn't know another | |||
Leaf111. So it will always get to Leaf121 and never get to Leaf111. | PrefixA attaching Leaf111, so it will always get to Leaf121 and never | |||
If the intension is that the traffic should be offloaded to Leaf111, | Leaf111. If the intention is that the traffic should be offloaded to | |||
then use policy guided prefixes defined in RIFT [RIFT]. | Leaf111, then use the policy-guided prefixes defined in RIFT | |||
[RFC9692]. | ||||
5.16. IoT Applicability | 5.16. IoT Applicability | |||
The design of RIFT inherits from RPL [RFC6550] the anisotropic design | The design of RIFT inherits the anisotropic design of a default route | |||
of a default route upwards (northwards); it also inherits the | upwards (northwards) from RPL [RFC6550]. It also inherits the | |||
capability to inject external host routes at the Leaf level using | capability to inject external host routes at the Leaf level using | |||
Wireless ND (WiND) [RFC8505][RFC8928] between a RIFT-agnostic host | Wireless ND (WiND) [RFC8505] [RFC8928] between a RIFT-agnostic host | |||
and a RIFT router. Both the RPL and the RIFT protocols are meant for | and a RIFT router. Both the RPL and the RIFT protocols are meant for | |||
large scale, and WiND enables device mobility at the edge the same | a large scale, and WiND enables device mobility at the edge the same | |||
way in both cases. | way in both cases. | |||
The main difference between RIFT and RPL is that with RPL, there’s a | The main difference between RIFT and RPL is that there's a single | |||
single Root, whereas RIFT has many ToF nodes. This adds huge | root with RPL, whereas RIFT has many ToF nodes. This adds huge | |||
capabilities for leaf-2-leaf ECMP paths, but additional complexity | capabilities for leaf-2-leaf ECMP paths but additional complexity | |||
with the need to disaggregate. Also RIFT uses Link State flooding | with the need to disaggregate. Also, RIFT uses link-state flooding | |||
northwards, and is not designed for low-power operation. | northwards and is not designed for low-power operation. | |||
Still nothing prevents that the IP devices connected at the Leaf are | Still, nothing prevents that the IP devices connected at the Leaf are | |||
IoT devices, which typically expose their address using WiND – which | IoT devices, which typically expose their address using WiND -- this | |||
is an upgrade from 6LoWPAN ND [RFC6775]. | is an upgrade from 6LoWPAN ND [RFC6775]. | |||
A network that serves high speed/ high power IoT devices should | A network that serves high speed / high power IoT devices should | |||
typically provide deterministic capabilities for applications such as | typically provide deterministic capabilities for applications such as | |||
high speed control loops or movement detection. The Fat Tree is | high speed control loops or movement detection. The Fat Tree is | |||
highly reliable, and in normal condition provides an equivalent | highly reliable and, in normal conditions, provides an equivalent | |||
multipath operation; but the ECMP doesn’t provide hard guarantees for | multipath operation; however, the ECMP doesn't provide hard | |||
either delivery or latency. As long as the fabric is non-blocking | guarantees for either delivery or latency. As long as the fabric is | |||
the result is the same; but there can be load unbalances resulting in | non-blocking, the result is the same, but there can be load | |||
incast and possibly congestion loss that will prevent the delivery | unbalances resulting in incast and possibly congestion loss that will | |||
within bounded latency. | prevent the delivery within bounded latency. | |||
This could be alleviated with Packet Replication, Elimination and | This could be alleviated with Packet Replication, Elimination, and | |||
Reordering (PREOF) [RFC8655] leaf-2-leaf but PREOF is hard to provide | Ordering Functions (PREOF) [RFC8655] leaf-2-leaf, but PREOF is hard | |||
at the scale of all flows, and the replication may increase the | to provide at the scale of all flows and the replication may increase | |||
probability of the overload that it attempts to solve. | the probability of the overload that it attempts to solve. | |||
Note that the load balancing is not RIFT’s problem, but it is key to | Note that the load balancing is not RIFT's problem, but it is key to | |||
serve IoT adequately. | serve IoT adequately. | |||
5.17. Key Management | 5.17. Key Management | |||
As outlined in Section 9 "Security Considerations" of RIFT [RIFT], | As outlined in Section 9 ("Security Considerations") of [RFC9692], | |||
either a private shared key or a public/private key pair is used to | either a private shared key or a public/private key pair is used to | |||
authenticate the adjacency. Both the key distribution and key | authenticate the adjacency. Both the key distribution and key | |||
synchronization methods are out of scope for this document. Both | synchronization methods are out of scope for this document. Both | |||
nodes in the adjacency must share the same keys, key type, and | nodes in the adjacency must share the same keys, key type, and | |||
algorithm for a given key ID. Mismatched keys will not inter-operate | algorithm for a given key ID. Mismatched keys will not interoperate | |||
as their security envelopes will be unverifiable. | as their security envelopes will be unverifiable. | |||
Key roll-over while the adjacency is active may be supported. The | Key rollover while the adjacency is active may be supported. The | |||
specific mechanism is well documented in [RFC6518]. As outlined in | specific mechanism is well documented in [RFC6518]. As outlined in | |||
Section 9.9 "Host Implementations" of RIFT [RIFT], hosts as well as | 9.9 ("Host Implementations") of [RFC9692], hosts as well as VMs | |||
VMs act as RIFT devices are possible. KMP such as KV for key roll- | acting as RIFT devices are possible. Key Management Protocols | |||
over in the fabric using a symmetric key that can be changed easily | (KMPs), such as Key Value (KV) for key rollover in the fabric, use a | |||
when compromised. Wherein symmetric key of a host is more likely to | symmetric key that can be changed easily when compromised; in which | |||
be compromised than of a in-fabric networking node. | case, the symmetric key of a host is more likely to be compromised | |||
than an in-fabric networking node. | ||||
5.18. TTL/HopLimit of 1 vs. 255 on LIEs/TIEs | 5.18. TTL/Hop Limit of 1 vs. 255 on LIEs/TIEs | |||
The use of a packet's Time to Live (TTL) (IPv4) or Hop Limit (IPv6) | The use of a packet's Time to Live (TTL) (IPv4) or Hop Limit (IPv6) | |||
to verify whether the packet was originated by an adjacent node on a | to verify whether the packet was originated by an adjacent node on a | |||
connected link has been used in RIFT.RIFT explicitly requires the use | connected link has been used in RIFT. RIFT explicitly requires the | |||
of a TTL/HL value of 1 *or* 255 when sending/receiving LIEs and TIEs | use of a TTL/HL value of 1 or 255 when sending/receiving LIEs and | |||
so that implementers have a choice between the two. | TIEs so that implementers have a choice between the two. | |||
TTL=1 or HL=1 protects against the information disseminating more | TTL=1 or HL=1 protects against the information disseminating more | |||
than 1 hop in the fabric and should be the default unless configured | than 1 hop in the fabric and should be the default unless configured | |||
otherwise. TTL=255 or HL=255 can lead RIFT TIE packet propagation to | otherwise. TTL=255 or HL=255 can lead RIFT TIE packet propagation to | |||
more than one hop (multicast address is already local subnetwork | more than one hop (the multicast address is already in local | |||
range) in case of implementation problems but does protect against a | subnetwork range) in case of implementation problems but does protect | |||
remote attack as well, and the receiving remote router will ignore | against a remote attack as well, and the receiving remote router will | |||
such TIE packet unless the remote router is exactly 254 hops away and | ignore such TIE packet unless the remote router is exactly 254 hops | |||
accepts only TTL=1 or HL=1. [RFC5082] defines a Generalized TTL | away and accepts only TTL=1 or HL=1. [RFC5082] defines a Generalized | |||
Security Mechanism (GTSM). The GTSM is applicable to LIEs/TIEs | TTL Security Mechanism (GTSM). The GTSM is applicable to LIE/TIE | |||
implementations that use a TTL or HL of 255. It provides a defense | implementations that use a TTL or HL of 255. It provides a defense | |||
from infrastructure attacks based on forged protocol packets from | from infrastructure attacks based on forged protocol packets from | |||
outside the fabric. | outside the fabric. | |||
6. Security Considerations | 6. Security Considerations | |||
This document presents applicability of RIFT. As such, it does not | This document presents applicability of RIFT. As such, it does not | |||
introduce any security considerations. However, there are a number | introduce any security considerations. However, there are a number | |||
of security concerns at RIFT [RIFT]. | of security concerns in [RFC9692]. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document has no IANA actions. | This document has no IANA actions. | |||
8. Acknowledgments | 8. References | |||
The authors would like to thank Jaroslaw Kowalczyk, Alvaro Retana, | ||||
Jim Guichard and Jeffrey Zhang for providing invaluable concepts and | ||||
content for this document. | ||||
9. Contributors | ||||
The following people (listed in alphabetical order) contributed | ||||
significantly to the content of this document and should be | ||||
considered co-authors: | ||||
Jordan Head | ||||
Juniper Networks | ||||
Email: jhead@juniper.net | ||||
Tom Verhaeg | ||||
Juniper Networks | ||||
Email: tverhaeg@juniper.net | ||||
10. Normative References | 8.1. Normative References | |||
[ISO10589-Second-Edition] | [ISO10589-Second-Edition] | |||
International Organization for Standardization, | ISO/IEC, "Information technology - Telecommunications and | |||
"Intermediate system to Intermediate system intra-domain | information exchange between systems - Intermediate System | |||
routing information exchange protocol for use in | to Intermediate System intra-domain routeing information | |||
conjunction with the protocol for providing the | exchange protocol for use in conjunction with the protocol | |||
connectionless-mode Network Service (ISO 8473)", November | for providing the connectionless-mode network service (ISO | |||
2002. | 8473)", ISO/IEC 10589:2002, November 2002, | |||
<https://www.iso.org/standard/30932.html>. | ||||
[TR-384] Broadband Forum Technical Report, "TR-384 Cloud Central | ||||
Office Reference Architectural Framework", January 2018. | ||||
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, | [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, | |||
DOI 10.17487/RFC2328, April 1998, | DOI 10.17487/RFC2328, April 1998, | |||
<https://www.rfc-editor.org/info/rfc2328>. | <https://www.rfc-editor.org/info/rfc2328>. | |||
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, | [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, | |||
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, | "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, | |||
DOI 10.17487/RFC4861, September 2007, | DOI 10.17487/RFC4861, September 2007, | |||
<https://www.rfc-editor.org/info/rfc4861>. | <https://www.rfc-editor.org/info/rfc4861>. | |||
skipping to change at page 35, line 35 ¶ | skipping to change at line 1566 ¶ | |||
"Deterministic Networking Architecture", RFC 8655, | "Deterministic Networking Architecture", RFC 8655, | |||
DOI 10.17487/RFC8655, October 2019, | DOI 10.17487/RFC8655, October 2019, | |||
<https://www.rfc-editor.org/info/rfc8655>. | <https://www.rfc-editor.org/info/rfc8655>. | |||
[RFC8950] Litkowski, S., Agrawal, S., Ananthamurthy, K., and K. | [RFC8950] Litkowski, S., Agrawal, S., Ananthamurthy, K., and K. | |||
Patel, "Advertising IPv4 Network Layer Reachability | Patel, "Advertising IPv4 Network Layer Reachability | |||
Information (NLRI) with an IPv6 Next Hop", RFC 8950, | Information (NLRI) with an IPv6 Next Hop", RFC 8950, | |||
DOI 10.17487/RFC8950, November 2020, | DOI 10.17487/RFC8950, November 2020, | |||
<https://www.rfc-editor.org/info/rfc8950>. | <https://www.rfc-editor.org/info/rfc8950>. | |||
[RIFT] Przygienda, T., Head, J., Sharma, A., Thubert, P., | [RFC9692] Przygienda, T., Ed., Head, J., Ed., Sharma, A., Thubert, | |||
Rijsman, B., and D. Afanasiev, "RIFT: Routing in Fat | P., Rijsman, B., and D. Afanasiev, "RIFT: Routing in Fat | |||
Trees", Work in Progress, Internet-Draft, draft-ietf-rift- | Trees", RFC 9692, DOI 10.17487/RFC9692, December 2024, | |||
rift-24, 23 May 2024, | <https://www.rfc-editor.org/info/rfc9692>. | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-rift- | ||||
rift-24>. | ||||
11. Informative References | [TR-384] Broadband Forum Technical Report, "TR-384: Cloud Central | |||
Office Reference Architectural Framework", TR-384, Issue | ||||
1, January 2018, | ||||
<https://www.broadband-forum.org/pdfs/tr-384-1-0-0.pdf>. | ||||
[IEEEstd1588] | 8.2. Informative References | |||
IEEE standard for Information Technology, "IEEE Standard | ||||
for a Precision Clock Synchronization Protocol for | ||||
Networked Measurement and Control Systems", | ||||
<https://standards.ieee.org/standard/1588-2019.html>. | ||||
[CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer | [CLOS] Yuan, X., "On Nonblocking Folded-Clos Networks in Computer | |||
Communication Environments", IEEE International Parallel & | Communication Environments", 2011 IEEE International | |||
Distributed Processing Symposium, 2011. | Parallel & Distributed Processing Symposium, | |||
DOI 10.1109/IPDPS.2011.27, May 2011, | ||||
<https://ieeexplore.ieee.org/document/6012836>. | ||||
[FATTREE] Leiserson, C. E., "Fat-Trees: Universal Networks for | [FATTREE] Leiserson, C. E., "Fat-Trees: Universal Networks for | |||
Hardware-Efficient Supercomputing", 1985. | Hardware-Efficient Supercomputing", IEEE Transactions on | |||
Computers, vol. C-34, no. 10, pp. 892-901, | ||||
DOI 10.1109/TC.1985.6312192, October 1985, | ||||
<https://ieeexplore.ieee.org/document/6312192>. | ||||
[PNNI] ATM Forum Technical Committee, "Private Network-Network | [IEEEstd1588] | |||
Interface Specification, Version 1.1 (PNNI 1.1), af-pnni- | IEEE, "IEEE Standard for a Precision Clock Synchronization | |||
0055.002", 2003. | Protocol for Networked Measurement and Control Systems", | |||
IEEE Std 1588-2019, DOI 10.1109/IEEESTD.2020.9120376, June | ||||
2020, <https://ieeexplore.ieee.org/document/9120376>. | ||||
[PNNI] The ATM Forum Technical Committee, "Private Network- | ||||
Network Interface - Specification Version 1.1 - (PNNI | ||||
1.1)", af-pnni-0055.001, April 2002, | ||||
<https://www.broadband-forum.org/download/af-pnni- | ||||
0055.001.pdf>. | ||||
[RFC3626] Clausen, T., Ed. and P. Jacquet, Ed., "Optimized Link | [RFC3626] Clausen, T., Ed. and P. Jacquet, Ed., "Optimized Link | |||
State Routing Protocol (OLSR)", RFC 3626, | State Routing Protocol (OLSR)", RFC 3626, | |||
DOI 10.17487/RFC3626, October 2003, | DOI 10.17487/RFC3626, October 2003, | |||
<https://www.rfc-editor.org/info/rfc3626>. | <https://www.rfc-editor.org/info/rfc3626>. | |||
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A | [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A | |||
Border Gateway Protocol 4 (BGP-4)", RFC 4271, | Border Gateway Protocol 4 (BGP-4)", RFC 4271, | |||
DOI 10.17487/RFC4271, January 2006, | DOI 10.17487/RFC4271, January 2006, | |||
<https://www.rfc-editor.org/info/rfc4271>. | <https://www.rfc-editor.org/info/rfc4271>. | |||
skipping to change at page 36, line 43 ¶ | skipping to change at line 1633 ¶ | |||
Perkins, "Registration Extensions for IPv6 over Low-Power | Perkins, "Registration Extensions for IPv6 over Low-Power | |||
Wireless Personal Area Network (6LoWPAN) Neighbor | Wireless Personal Area Network (6LoWPAN) Neighbor | |||
Discovery", RFC 8505, DOI 10.17487/RFC8505, November 2018, | Discovery", RFC 8505, DOI 10.17487/RFC8505, November 2018, | |||
<https://www.rfc-editor.org/info/rfc8505>. | <https://www.rfc-editor.org/info/rfc8505>. | |||
[RFC8928] Thubert, P., Ed., Sarikaya, B., Sethi, M., and R. Struik, | [RFC8928] Thubert, P., Ed., Sarikaya, B., Sethi, M., and R. Struik, | |||
"Address-Protected Neighbor Discovery for Low-Power and | "Address-Protected Neighbor Discovery for Low-Power and | |||
Lossy Networks", RFC 8928, DOI 10.17487/RFC8928, November | Lossy Networks", RFC 8928, DOI 10.17487/RFC8928, November | |||
2020, <https://www.rfc-editor.org/info/rfc8928>. | 2020, <https://www.rfc-editor.org/info/rfc8928>. | |||
Acknowledgments | ||||
The authors would like to thank Jaroslaw Kowalczyk, Alvaro Retana, | ||||
Jim Guichard, and Jeffrey Zhang for providing invaluable concepts and | ||||
content for this document. | ||||
Contributors | ||||
The following people contributed substantially to the content of this | ||||
document and should be considered coauthors: | ||||
Jordan Head | ||||
Juniper Networks | ||||
Email: jhead@juniper.net | ||||
Tom Verhaeg | ||||
Juniper Networks | ||||
Email: tverhaeg@juniper.net | ||||
Authors' Addresses | Authors' Addresses | |||
Yuehua Wei (editor) | Yuehua Wei (editor) | |||
ZTE Corporation | ZTE Corporation | |||
No.50, Software Avenue | No.50, Software Avenue | |||
Nanjing | Nanjing | |||
210012 | 210012 | |||
China | China | |||
Email: wei.yuehua@zte.com.cn | Email: wei.yuehua@zte.com.cn | |||
Zheng Zhang | ||||
Zheng (Sandy) Zhang | ||||
ZTE Corporation | ZTE Corporation | |||
No.50, Software Avenue | No.50, Software Avenue | |||
Nanjing | Nanjing | |||
210012 | 210012 | |||
China | China | |||
Email: zhang.zheng@zte.com.cn | Email: zhang.zheng@zte.com.cn | |||
Dmitry Afanasiev | Dmitry Afanasiev | |||
Yandex | Yandex | |||
Email: fl0w@yandex-team.ru | Email: fl0w@yandex-team.ru | |||
Pascal Thubert | Pascal Thubert | |||
Cisco Systems, Inc | Individual | |||
Building D | ||||
45 Allee des Ormes - BP1200 | ||||
06254 MOUGINS - Sophia Antipolis | ||||
France | France | |||
Phone: +33 497 23 26 34 | Email: pascal.thubert@gmail.com | |||
Email: pthubert@cisco.com | ||||
Tony Przygienda | Tony Przygienda | |||
Juniper Networks | Juniper Networks | |||
1194 N. Mathilda Ave | 1194 N. Mathilda Ave | |||
Sunnyvale, CA, 94089 | Sunnyvale, CA 94089 | |||
United States of America | United States of America | |||
Email: prz@juniper.net | Email: prz@juniper.net | |||
End of changes. 211 change blocks. | ||||
682 lines changed or deleted | 699 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |