Archives

These are unedited transcripts and may contain errors.

Routing Working Group session
16 May 2013 2 p.m

CHAIR: Welcome, please have a seat, all those people standing over there and talking, let's become a little bit more quiet please.

Let's go through the motions here. This is the Routing Working Group. A few things.

I am co?chair of this Working Group, my co?chair Rob Evans is sitting here in the first row. If you have any issues, please approach one of us, when going to the microphone to comment or ask or do whatever you want to do at the microphone, please state your name and affiliation so that people that are watching us remotely can identify you.

I'd like to thank the scribe, Alex Band, taking notes for us, thank you very much. Scriber also from the RIPE NCC, doing the Jabber, and of source the stenographers who have become such an essential part of everything we do now. And

Thank you to them all.

The minutes from RIPE 65, that's the previous meeting, they were circulated in the list, does anyone have anything would prevent us from making them final at this stage?

I'll declare them final. A reminder to vote in the Programme Committee. The reminder is also for volunteering, in you want to portacabin an and Kate you have until 3:00pm today. Online vote something tomorrow morning

/SRAEPBD bashing. This is what the agenda looks like for today. It's a long agenda. We will probably overrun it a bit. In fact, we have asked the RIPE NCC to move the result of the elections, the GM, to the room next door just in case we take a little bit longer that we have here, so as to minimise interference.

And we are going to go straight through to the first talk, that's George Michaelson.

GEORGE MICHAELSON: Hi, this presentation is the result of an informal exercise that we did at the APNIC APRICOT meeting earlier this year in Singapore. We felt that little disconnected from the routing active membership and region with the emerging issue to do with management of route objects and I want to stress this is not in the class of a policy discussion or a concrete proposal. It's the reflection of a discussion we have had in the Asia Pacific region that we thought was interesting and we wanted to bring up here and I have already had two really key interventions, Wilfried highlighted some issues in the DB Working Group yesterday and Asumi Katami from JPMIC has pointed out some feedback in the Japanese routing committee which I have tried to reflect into this presentation so I hope that it's a useful contribution to the field.

This is the standard template of a route object that you get from the RIPE NCC. And you can see that it really only has a small set of mandatory components and they make this assertion which the origin AS in origin over the set of prefixes that are announced in. That's really the essential BINDings. All the other stuff it has qualities that it's more about WHOIS and RPSL management than about routing per se.

Now, this is the ASN 1 object which is the definition of the ROA structure and it has this AS ID and this IP address blocks. And additionally a thing called max. Length, but it's basically making an assertion about the origin AS from AS ID over the prefixes that are listed in address blocks. So you come to this place where you say well this is an assertion about origin AS over prefixes, and this is an assertion about origin AS over prefixes. So, they kind of have this quality of sameness.

And you get the second thing in the conversation, if you like, that route objects, they they have a really simple use, they are sometimes just used to make this assertion. Then they have a more complex use about qualifies, aggregation behaviour, aspects of community membership, and the simple use look to us very similar to ROA behaviour, and it was about vesting that property of who makes the assertion and this is where we come to a difference. A ROA, in RPKI, is only made under the authority of the prefix holder, the AS holder does not authorise or does not get valid ated in the creation of the ROA object, nothing in the cryptographic chain requires the permission of the AS holder. And that's really quite different from a route object. In a route object both the AS holder and the prefix holder have to be seen to parts pace in the creation and there is language in the supporting documentation of WHOIS that says things like the AS aut?num must be in IRR. If it doesn't exist, create it. And from a routing perspective it makes perfect sense, here I am I am doing a prefix, I need to get it in, I haven't got an AS, better make that. Except you come to this place with address management that takes you to a different path. So, we are now living in the final /8 world, post run down world and if you are ISP you have your investment in address holdings and and you have your AS and you desperately want new business. Somebody comes to you as an content provider who is hosting an SSL backed website and they need their own IP, you are haven't got any. This isn't a problem. You need to go to APNIC to get the address at which point they are the titular address holder so immediately we are entering a world where more and more participants if routing are address holder and they are not primarily in the business of doing routing. I know there is a kind of cultural difference here, you have a lot more direct AS ownership within this region so. This may reflect a difference between Asia and Europe but in the Asian footprint people by and large don't do independent route. They are getting the address because it's part of the business cycle of becoming a content provider and they are allowing their immediate upstream ISP to assert by origin. And the process here has started to get clumsy, because quite a lot of people don't actually do IRR based routing so they don't have a mechanistic process to be in the game of doing a route object because they are configuring their router. They just configure their router but the problem is you need the route object for its other purpose, you need it for the filtering outcome. You need it so that other people will accept the origin AS. And this failure to connect, because they are not in the business of maintaining a route object means that often we, the RIR, have to get involved in the process through help DNSSEC, Hi, no one is actually is he ISP doing this, could you please help me create the route object, this becomes a very awkward process, it's intrusive and it doesn't come to completion in a quick time. And again I don't want to go too far, but I'm getting a hint that this has cropped up in this footprint, this is starting to emerge in Europe as well as a problem. You are also in final /8 space, you are now going to be living in a world where more of you don't have the ownership responsibility of the address. Which means more of are you walking into this world where the creation of a route object is about a more complex procedure.

So, we were hoping to go to somewhere where we could simplify the process. As a bit of background did I some basic analysis of what I was actually seeing in the route object space and very, very few of the existing route objects in your routing registry, which by the way is massive, very few of them actually do complex routing behaviour, 3,000 out of 200,000 it's really not a lot.

This is the rough count of the complex behaviour and I would suggest that some these like ping handle and pingable don't actually materially affect the routing status. They are more administrative background. Obviously export comes, they actually play to real routing config outcomes but there is this sense, it's only 1%.

So I was looking at the Asian footprint, and we also have a really low count of boring objects, we don't do complex routing. It just doesn't happen in our community. But then there was this moment when do we get 52,000 route six objection. The IPv6 routing cloud that's 9,000 prefixes, so 52,000? And it turns out that we have one member that's done almost a complete deaggregation to the is there 48 level of their 32 and they have created automatically a route object for everyone. I think there will be a conversation with that routing person about process management.

So, we are searching for a dialogue, we are trying to understand what we could do to firstly get a more efficient process to bring these addresses into use in routing and secondly, maybe, to get some alignment between RPKI as an emerging route security framework and IRR as a route configuration framework. Because we feel like we're seeing that they are saying the same things and we thought about the possibility that we could start to create automatic representation of ROAs as route objects, simply for the purpose of informing the community, this prefix is originated and this might potentially allow filter management, the filter side of the deal to get to a better place. So we're interested in the discussion, because this may be interesting, what do people think?

And the second part of it comes to the permission model. If we made a step and said the AS hold certificate not considered key in ROA, could we do that in the route object? And this is where I think Wilfried got very, very quickly to the point that we'd missed in the first round of discussion, it's a dual purpose object, it would be lovely if it was only specifying filtering behaviour because then this would be very direct bun fortunately it's consumed by router configuration people to create the announcement. And so Wilfried has had that insight, there is obviously a risk here. Someone is going to to use her permits in a prefix to create a flood of announcements based off your AS and if you don't have the right to control that, then that's in some functional sense it's an attack on the integrity of routing. I will observe that it's not an anonymous attack because you are a known player in the system. So there would be a construction of a second tier of defence, but I would have to say, we didn't really think that through. So, it's a weak innocence what we were proposing there.

What we'd actually been before we considered formalising a reduction in permission, we thought about could we formalise a process, so in a case like this where an address holder comes into us and says there is nobody who is doing this, we would say okay we are going to mail them and give them 72 hours notice this if they don't concretely object, we will create this filter state. And in a sense, it's the midway point. There is a check because it's manual. That's kind of ugly, on the other hand, we set a limit on the time to achieve filter against threes prefixes, 72 hours and you are visible. It's not really great. But we're trying to find a position that we feel could work for the help desk hostmasters and work for the community of address holders.

Then a third thing that came to mind because of the work that is being done to prove and extend WHOIS, we have a capable of tagging which is like a second order quality of tagging to data. We have the potentially to say this is going to change IRR tools, how about if they said this is tagged as prefix only, or tagged as complete or system other class of tagging, we could emerge to a world where you have an admission that only has prefix permission, that can be absolutely explicitly in the route object and we could distinguish between this has function in filtering and this has function in router config. And we are interested in that possibility, that may be the road out of here, because tearing the two apart and saying there is got to be two objects, that's like changing RPSL, that's never going to happen. But creating a qualitative difference that we can understand between the objects, this may actually get to us useful place. So, that again is part of the conversation, we would love to have some understanding from you, the prime parties that have to use this material, how you feel about that.

And that actually is it. I'm inside my time. We have only probably got three or four minutes but if there is anyone had a thank has got a reflection or position on this, we'd really love to understand.

AUDIENCE SPEAKER: Alex Band from the RIPE NCC. You talked about automatic ROA creation. I think that's a little bit scary.

GEORGE MICHAELSON: Automatic route creation from ROA.

ALEX BAND: That's better.

GEORGE MICHAELSON: Bad wording on slides. Never implement slideware.

AUDIENCE SPEAKER: We spent an all of long time making sure that the RPKI system has good data quality.

GEORGE MICHAELSON: And the integrity of RPKI is extremely important. I believe Wilfried could stand up and say exactly the same thing about RPSL and IRR so I can also see a quality that automatic route object creation, that makes some people squirm.

AUDIENCE SPEAKER: Because right now we are aiming for a lot of simplicity but you could of course take it a step further and give useers an insight into the ROAs that they have created and how that compares to the /ROUBS that exist because right now, it's completely, two completely separate worlds and he need to maintain all of that data in two places.

GEORGE MICHAELSON: And this is not good.

AUDIENCE SPEAKER: I'd like to bring ??

GEORGE MICHAELSON: Focusing on the quality about filter. Do I admission filters, would I like a filter that says I will accept the origin, that's distinct from I construct router config, I need to announce the origin, I think we missed that quality of announce when we originally did this. I'm kind of taking a walk back from some of this. I understand there is a community that can't go there, trying to find ??

AUDIENCE SPEAKER: One more thing. This side of the discussion is on the information creation side to actually putting ROAs and route objects into the system, but actually using that data right now we have got an RPKI validation tool which is talk to an RPKI capable router and you can get all of that data out and of course you have, well the complete RPSL base toolset and also those two are completely separate worlds that could be integrated, maybe a little bit more. Maybe. But that is maybe also an interesting point for suggestion.

AUDIENCE SPEAKER: Randy Bush, Dragon research labs. The reason I switched title is we produced the open source of the RPKI stuff. In particular some large German incumbent telco who bases a lot of their router configurationings on this weird RPSL stuff, asked us, hey, if you take the RPKI ROA and generate a route object for us, then we just put that in front and rock and roll and that and that route object is more comfortably authoritative.

GEORGE MICHAELSON: Where is the but? I am waiting for the but.

RANDY BUSH: It's right here.

GEORGE MICHAELSON: Are you saying that there was no but, dot, dot, dot?

RUEDIGER VOLK: Kind of, you are the but. Just one T. Well, okay, there was ?? there happened to be an ITF in Dublin, and you can figure out how long ago that was, and exactly the observation how the ROAs and the routes map has been explained there and there are slight differences, but I, for speaking simple, would strongly object for mucking around with a dying RPSL and arranging the deck chairs on that vessel, while the RPKI is coming up and, yes, with the mapping of RPKI ROA information into RPSL, there is the tool for having ?? fora voiding creating inconsistency between stuff that comes from new information and validated information towards some parallel hand cared other representation of the same stuff. Unfortunately, I still haven't done the lightning talk on how all of this can work right now.

GEORGE MICHAELSON: So given time constraints, I think it has a quality of ?? I have got to go read and I have to help my hostmasters read, but my immediate response would be, we have a live problem now, we have a problem now with now address holders on the final /8 unable to achieve route visibility. So appreciate the slide said ROA automatic, but that's only half the discussion.

RUDIGER VOLK: If you actually can tell the address holders that they could issue a ROA for the particular prefix

GEORGE MICHAELSON: In practice, the route filter set in 40,000 routers will not change, and their prefix will not become globally visible and then they come back to help desk and say, I need a route object.

RUDIGER VOLK: Well, let me add the second sentence. Make sure that the relevant parties can create the ROAs, I helped a customer at APRICOT to do it and APNIC, and if you want to muck around with presenting that information in the RPSL IRR, you have the tools for offering those using the IRR to use that ROA information. And it can go in there. And there is no problem there. However, your report that your customers are having that kind of problem actually brings me to a surprise and to the question, since what time is APNIC enforcing RPSS?

GEORGE MICHAELSON: We don't have RPSS or distributed /OFRB against objects not in the local WHOIS space. So, if the relationship between AS holders and address holders functionally includes ASs that don't have authority to exist in APNIC, we are in a difficult place, and the documentation within RIPE is a suggestion of creation of a local aut?num object would then permit creation of the route object. But if you are viewing the world from a database integrity perspective, this is a very painful moment, because you are now transgressing responsibilities of resource that vest elsewhere. So I do accept, if we had done RPSS and R PS overtype models and we had external references in the RPSL model implemented, we'd have a path out. Except, except Rudiger, there is this community of people who aren't doing automatic router config and who just aren't in the business of doing their side and so don't have a direct line of engagement against this activity. There isn't a responsible AS holder who is going to do any action. And it's painful, but it's true. Whereas the filter set people are really quite active and so if you are existing in a route object, your filter sets globally don't lift and your prefix doesn't route.

RUDIGER VOLK: Again, if you can, if you can have the address holder do the ROA ??

GEORGE MICHAELSON: That part is done, that is a given. It's just a one click in the UI.

RUDIGER VOLK: Well, okay. Actually taking a particular person doing it for a particular prefix ??

CHAIR: We are going to have to move on a bit ?? this is actually on important discussion but we have got a lot of things. If you can have this discussion a bit more on the list for this. We'll have the people at mikes and then we need to move on, I am afraid.

GERT DOERING: I have been on the problem end of this trying to create routers, so I know how annoying it can be. The risk of completely opening it up. If you are the dread holder you can put in route object as many as you want, is that your blowing up the sparse configuration memory of our people's routers. So if you have a /19 you can create 8,000 lines of configuration which is guaranteed to over flow somebody, so don't go there please.

GEORGE MICHAELSON: Yeah, strong message.

GERT DOERING: The other thing is, it may sound a bit nasty, but if I am buying services by somebody who is not able to manage to find a responsible person in their house in 72 hours to actually get their address space authorised with their AS number and make it work, maybe it's time to get a different upstream.

GEORGE MICHAELSON: We did have that internal discussion, but it's not a role that you can take any RIRs Hostmaster to.

GERT DOERING: It's not their job to mix business relations in your region.

GEORGE MICHAELSON: We like to be helpful about the usability of the addresses we are giving people.

GERT DOERING: They are perfectly usable, if some up strait is not working if they can't gait their MPLS; it's not your job so fix their MPLS.

GEORGE MICHAELSON: I think it's good feedback and I'll take it to the responsible guys.

AUDIENCE SPEAKER: Wilfried Woeber, with two hats; one hat of the Database Working Group co?Chair and the other one as involved in the research and education network. Thank you for bringing this up I was not aware of the existence of this problem before this meeting. That's one message

The second message is you have heard already a couple of concerns from the operations guys so I don't have to sort of reiterate.

The third thing, and that's actually the are reason why I'm standing in, not in front, but behind the microphone, is I am feeling extremely concerned by ideas to require timely human reaction by the ISPs or by the AS managers, and by not reacting, sort of the default thing is silence is consent. I think it's the wrong way around. Because my feeling is that if we are going into that direction, we are actually going to undermine the usage of the registry databases as the routing registry in the first place, because in some shops there is some sort of disconnect between the people managing the IP address space and doing the clicking and set /?LG up the database stuff, but they are not necessarily in a work flow position to, on a regular basis, react reasonably to those POP up requests.

GEORGE MICHAELSON: I think it's very clear from the kind of response that the desire to help can have its limits and we will have to go and reflect on this. There is a problem, but it's too early to go to solutions space.

WILIFRIED WOEBER: The management summary for my contribution is we have been working for a long time to make this whole toolset as automation friendly as possible. And anything that puts in requirements to have human intervention on a regular basis, I think it's not a good idea.

GEORGE MICHAELSON: I think the tagging idea might still have something in, it that's something to reflect on but we're not there yet.

WILIFRIED WOEBER: Assuming that all the various tools and all the various scripts and all the various provisioning programmes are actually easily adapted, and I have some doubts there looking back at the history of the I are RR toolset.

GEORGE MICHAELSON: This was a request from our hostmasters to explore the space and Geoff and I as researchers said we have suggestions but rather than just doing that, why don't we have a dialogue with people who are in routing, so essentially, this is fulfilled the purpose because what we have got is a really strong signal, too risky, wrong kind of change, wrong dynamic, doesn't respect our side of the process. And that's fine. We can take that back and say guys, it can't be.

WILIFRIED WOEBER: But still the problem is ?? or your problem is still there.

GEORGE MICHAELSON: Yes, and not all problems can be solved quickly.

WILIFRIED WOEBER: No, but we should not just come back and tell those people, sorry, we can't do anything. I think it's our possibility to find ways to solve the problem, but maybe on a different path with a different mechanism.

GEORGE MICHAELSON: Thank you for the opportunity.

CHAIR: I think we definitely need to do some work here. I am surprised that you say it's the first time you have heard it have, because certainly it's come up in routing before, and we had a couple of years back we had a meeting where it was mentioned and we tried to get some extra time and it didn't quite pan out.

VESNA: One is from the measurement point of view, the thinking here in introducing in RIPE Stat additional routing consistency check which would show not only the presence of the route object and the route in RIS, but also the ROA. So, would there be interest in that, in this community? Would you like to be able to see if there is a corresponding ROA for the same prefix that is already in the routing registry and/or in BGP? Show of hands...

CHAIR: I think the question is: Should there be a tool which gives you a visual indication of ?? a quick visual indication of the consistency between RPKI based routing ROAs and the RPSL route objects. Who would be interested in that? Who understands what I'm asking?

VESNA MANOIJOVIC: This will help us prioritise. And the second thing is the comment to what you said. So, direct, the users another provider who can do it and there I would like to promote Ripeness again, so there you can see what the providers who are capable of creating a route object.

CHAIR: Last word, and Randy.

RANDY BUSH: I really appreciate that ISOC and APNIC and everybody want to help me run my network, but I think, maybe I'll go take one of those classes and kind of do it myself.

CHAIR: Thank you. Right, there is definitely something for to us do here. But, I think the next step is to bring it up again on the mailing list and work through a few of these discussions. So next up is Christian Martin who is going to tell us a bit about segment routing.

CHRISTIAN MARTIN: My name is Christian Martin, I am with Cisco systems, my co?presenter was unable to attend, many of you probably know him. So, he sends his best and I'd like to thank you for the opportunity to present to you today.

So, when we undertook the problems of coming up with this technology, we looked at several considerable factors, and there is some goals and requirements that we took to be important in guiding our effort and one of them of course is to make things easier for operators, no matter where you are or where you stand on MPLS deployments, most people do have some considerable concern about complexity with MPLS and trouble shootability, things of that nature so. We wanted to improve scale, we wanted to simplify operations and we wanted to minimise the introductory complexity of new technology, that's something that often inhibits adoption of new technology.

Obviously in the current state of affairs in our industry, network programmability, software defined networking, things of this nature are important topics and therefore we wanted to enhance service offering capability for service providers through programmability. So this should be a programmable technology as well. It doesn't have to be, but it should lent itself well to programmability.

We want to leverage the efficient MPLS data plane that we have today. I should note here quickly, time was obviously of importance here for this presentation. We do have an IPv6 data plane option as well to you can substitute MPLS for IPv6. There are obviously differences in the implementation but the behaviours are the same. And I'll also cover ISISes, the protocol but OSPF of course is also supported in this technology. So, you know in you are an OSPF user or if MPLS is something you'd prefer not to use so you can substitute in your mind IPv6 for these.

In the MPLS case the data plane case, we leveraged the capabilities we have today. There are no new data plane operations needed. The maintain the existing label space and structure, things of that nature. And we wanted to be able to take advantage of the existing and support the existing service that is we have today in MPLS and these are important services, things like VPNs and fast reroute, the ability to do explicit routing and traffic engineering. As I mention IPv6 data plane is a must and it should have parity with MPLS and at least conceptually it does, but clearly there are challenges in terms of hardware implementation when we are talking about doing longest match look ups versus hash space lookups that we do with MPLS, so keep that in mind.

Our operator friends have asked us for improvements in LDP and RSVP behaviours. Simplicity, that's something that's always a goal if we can make that happen. Less protocol interaction to say trouble shoot. Something that an an unfortunate consequence of technology as it evolves it builds upon prior technology and in many cases leverages existing technology. We find we have a lot of protocols interacting each other to make new protocols function at a basic level. We want to deconstruct some of that interaction to simplify the operation of a network.

Avoid directed LDP sessions, this is an LDP specific things. As many you of maw matter aware for remote LFA or LFA support or remote LFA, in order to get better coverage, one of the things that has been widely suggested to do is to create directed tunnels. In many cases in fact to do any kind of useful MPLS operation you need to have some kind of an adjacent relationship with the remote consumer if you will of that label and that's something that he wanted to be able to remove a requirement from, because in many case that is increases the amount of trouble shooting if you have six or ten LDP sessions from node to node, that obviously increases the scale requirements.

And we wanted to be able, to as a result, also deliver an automated fast reout top capability or any topology. Again going back to my previous point about RLFA, we do know that not every topology supports local LFA and therefore we have to tunnel to remote nodes to provide for protection in topology that is don't lend themselves to simple LFA. So if we could this in any topology it would be a huge capability. Obviously we can do this with RSVP TE, but we are trying to remove that as part of one of the goals.

Scale, we should be able to avoid millions of labels in the LDP in the database. And avoid millions of tunnels to configure, and while it sounds like I am using an arbitrary cardinal number, it's obviously something that many networks have to deal with today. 1,000 node network which is not particularly large in scale for a large international ISP requires millions of tunnels if you fully mesh them. So this is something that we wanted to be able to minimise.

And as I mentioneded, we want to be able to create a programmable environment but something that is also lends itself to application interaction. What we mean by that is, applications whether the user applications or network or system applications should be able to influence the behaviour of the network. Now, bar any you know security considerations that are associated with that list, make the assumption that we trust these applications for whatever reason, better or worse. We should be able to influence the behaviour of the network for new applications, virtualisation for cloud based delivery, for guaranteeing SLAs and for providing network fish /SAOEFPLT these are are goals that led to the development of this technology. And so segment routing is something that we can take to be simple to deploy and operate. An it leverages the MPLS data plane we have it under and you can substitute IPv6 for that. We are just talking about sub, it L v?s to enable this behaviour and some code to create the binding between these segment values which we'll talk about in a moment and the topology that is created.

And to be able to provide optimal scaleability, resiliency and virtualisation. That means to be able to create different topology and do it in a way that doesn't require significant increases in the amount of state or data that needs to be maintained in the network. Should be SDN enabled, being a buzz word but more importantly should be programmable, we should be able to direct the functions of this network without having to incur new operational capabilities that would potentially put more burden on our /PRAEURBS. The goal of course is to reduce that burden.

(Operators) it should be standard base. As of now for the ISIS protocol. We have multi?vendor support for both the vendor community as well as from the operator community. And we are seeking additional assistance if people are interested in contributing, please contact us and we'd be happy to discuss how you can contribute. So it's an open technology, it's something that we expect to be widely contributed and hopefully widely deployed.

Now I have told you all about it, but I haven't told you anything about the technology. What is segment routing.

Forwarding state, we all these segments are established via the IGP. One is an /SKWRAPBS segment and one is a node somethingment, it's quite easy to understand that they are. /SKWR?PBS is a ?? this is entirely what the topology is given to us, via OSPF and ISIS, we have visibility about the /SKWRAPBSs in the network. In this particular is leveraged without any modification, push, POP shall swab. In our case, in the MPLS case, the segment is the label itself, so a label value is associated with a segment. And it /PRAOEUFTSZ for a source routing and as everybody here is likely aware of source routing gives us the ability to do explicit routing. At the source of the node we construct a /PA*FPLT in the RSVP TE case we have to signal that path and maintain that path through state maintenance techniques. In the case of segment routing we don't signal the path per se we construct the path through the label stack. So the label stack conveys the path hop by hop through the network. Therefore the network itself doesn't have to maintain any state.

So, what's an adjacent see segment for example in this case, on or topology is a label value between router C and router O, that's assume there is a single data /HREUPBG between nodes and /KR?RBG allocates a local data (C). And it advertises this label in ISIS as part of the sub, it L v?. It's leverage existing connectivity, and using that to distribute the topology in this case it's distributing a label topology.

C is the only node to install this /SKWRAPBT see segment in the MPLS data plane. So in order for us to use that we have to find a way to get to C. (Adjacent) one way is to do that so to construct a strict explicit route from A to Z in this case F you look at these B is allocated 9101 on its /SKWR?PBS to A, and so on. (/SKWR?PBS is adjacent see) and we can build a packet, so there is an IP packet of forwarding equivalence class from A to Z and we want to construct a path. One way of doing it is by constructing a label stack that represents all of these /SKWR?PBS segments and as we pat the packet along the network, we POP the label associated with that segment and continue to forward the path along the network. And so we get to the end.

So that's an /SKWR?PBS segment. Now, for more simpler operations one of the things you notice is we have to construct these label stacks and maybe I don't want to do any explicit routing or I only want to use it for simple applications. The more simple case is just to use a node segment and this is a router itself. And in this case we have to organise or coordinate these node segments in the network or in an SR domain just like we coordinate router IDs in an IP routing domain. We need one label per node in the IGP domain, even a network with 10,000 nodes, still would only represent less than 1% of the total label space given that the label space is over a million, at least for a specific level of labels in the stack. So, with a million labels in the space if we had to allocate even 10,000, that's an insignificant percentage if you will, of the total label space. We need to configure this. Now, without going into the too much detail. It doesn't have to be the same label space throughout the network. We could index these similar to the way VPLS autodiscovery works. But let's just assume for simplicity that each router has a static label range that we can configure in the network, and each individual router is given a specific and unique label to represent its node label. So, for example, node Z in our topology will be given label 65 by the operator and that label will be distributed throughout the network. And each node will put in its FIB a label binding along the shortest path to that node. So, for example, C puts in its FIB to reach label 65, I use my adjacency along the shortest path in this case C to D and I put that label value in the FIB to reach Z.

And now, one of the things that's interesting about this, I mentioned that we didn't change any behaviours in MPLS in the data plane but we do something clever in order to maintain the operation. There is no real operation in the label forwarding capability that allows you to continue the same label value, just to forward and continue. So we just do a swap of the previous label with the same label. And so, if we want to sent a packet to Z, A pushes 65 on the packet and then B swabs 65 to 65. So this is where the global uniqueness is required because obviously if there were different values then that would potentially create unusual or incorrect forwarding paths for node segments and so on and so forth.

So a packet Z in this case, we have the label on the stack and then at the penultimate hop hop we POP the label.

In order to combine segments to perform source routing, we take advantage of both node segments and adjacency segments. This is similar to using a loose and strict source route if you will in an ERO or an MPLS in an RSVP TE path. The shortest path may be A, B, C, D and then Z but we'd like to bypass D in this case and use the detour, and in this case we use the node segment to C and then the adjacency segment from C to O, which we needed obviously as I mentioned before, the value 9003 is non?unique ?? it's unique in the case from C's perspective to reach O, so we have to get the packet to C first, then we pop the node segment that represents C and then C inspects the packet and see that is there is 9003 which it knows its adjacency to O and we forward the packet along to Z using the node segment to Z. So the label stack and the header looks like this, we POP the node segment at C, and so on. And then we POP the node segment to Z penultimately at P.

The same thing applies if we want to just take the advantage of node segments in this case. This will give the capability of doing ECMP multipath rather than, if you can imagine a segment is just for a data length, and therefore its strict in its forwarding semantics. If we want to take advantage of multi?cost paths for example in C and O, rather than using the adjacency segment from C to O, maybe there are multiple data lengths, we then use the node segment and let C load balance across its equal cross paths to O. So, there is two options here in this case.

And the same behaviour on the label stack. POP the label at C, we POP the label at O and then we POP the label at P.

As I mentioned, ISIS automatically installs the segment. So we have a simple extension. You can see here that A distributes a node segment ?? C distributes a node segment throughout the topology, therefore A knows how to get to it because B does, it's a straightforward extension of a link state IGP that provides this capability. And it has excellent scale. In this case we only have to distribute as many node labels and as many links so in a topology that that is N nodes, an A adjacencies it's just N plus A state that needs to be maintained in the network and by the way that's already in the network, in most cases, from an information perspective, because we have a link state IGP that's already running in the environment. You can make an argument that you add a little bit of information in this case you are adding 32 bits of label allocation, really only 20 bits of label allocation per information element in the IP or in the ISIS domain but nevertheless it's still pretty much the same amount of state. It's clearly a significant improvement over the quadratic state required to maintain a full mesh of MPLS tunnels and the logorithmic multiplier that's required to maintain an LDP database. LDP is typically L times ?? that's still much more than N.

And one of the things I think that's perhaps one of the bigger capabilities is guaranteed fast reout and its automated. So going back again for those of you have who have studied remote LFA, one of the challenges is determining how to reach a PQ node and not to go into all the details of LFA bun of the nice things about segment routing is we don't have to create directed session to, say, reach a PQ node. All we need is to two identifiers, one to reach the P node and one to reach the adjacent Q node. So for every topology, as long as there is not redundant path. It can't be non?redundant of course, we wouldn't be able to create a directed LFA session in the absence of any alternative path but as long as there is an alternative path we can reach that in only two segments or in only two identifiers. So it's from a forwarding plane perspective, and from a State maintenance perspective on the nodes we have a significant improvement in performance. I think depending on the applicability if you have studied it and depending on the topology in many cases LFA would require at least up to six directed LDP sessions per node to provide for RFA capability. In this case we don't need to create any sessions because we already know how to reach the RLFA neighbour via its node segment. It's already programmed into the topology.

So, some use cases, just to kind of give a flavour of what this is all about and what we can use it for. Obviously if provides an MPLS transport substrate. So VPN services anything that you are running today over MPLS can be run across this and because we use the node segment, we leverage ECMP capabilities in the network F you are network is constructed with some similar statutory for the purposes of providing end?to?end ECMP we can support that for VPN services or whatever services you are tunnels over MPLS.

So support scaleable TE. We only need to maintain N plus A state in the network rather than N squared. And what may be not clear here and you'll see it in a moment, we can construct, if we have the ability off line or if we have the ability to programme the label stack and that's the implication here, there is an implicit point that I should make when we talk about TE, in order to do TE in any real sense externally we need to tell the ingress routers what their label stacks will be for specific fix and that means when I say we have improved the scale, what we have done is we have extracted the state maintenance requirements out of the network and put them in the network management layers if you will, but that network management layer as much more scaleable because we can scale it horizontally, if you wanted to construct a factorial number of paths across your network just to see if you can do it, you could do it and the network still only has N plus A state. So, we do have that capability.

One of the things that's nice about it is the ability to create simple disjointness. ECMP in the classical sense from A to Z assuming we have the dual plane topology will construct equate paths across the topology in the way it's represented here at this cross the top but perhaps we don't want to create that kind of spread, perhaps we want a bias traffic across one of the planes for a specific reason that it's the primary plane for example and then use red exclusively as a backup plane. In order to do that one of the things we can do and this is an owe clever application of the node segment, we can create multiple node segments in this case we will create an Anycast segment that represents the blue plane. Every node on the blue plane here sends and additional segment ID, that's its Anycast node ID and that value is 111. So all four of these routers have their individual unique node IDs and then they have an Anycast node ID in this case we push on this label stack of 111 to direct the traffic or to attract the traffic to the blue plane. Any node of that plane has 111 configured as one of its node segment IDs, POPs that 111 value because it's itself, it's all of those net rows routers on that plane. At that point it forwards the traffic to 65 across the shortest path across the blue plane that. Gives us the ability to create distinct and simple disjointness.

Cost base traffic engineering, it's kind of a brute?force example. But we want to send ?? we have a data path between Tokyo and Brussels. The path or the transit traffic costs to go across the United States is cheaper. All the bulk data we want to go across for voice traffic we want to take the low latency path but the more expensive path across Russia in this case what we would do is create a node segment or Anycast segment for Russia that we put on the label stack and so traffic that is voice?over IP base will push on the label stack for that, will do some kind of classification, whatever it may be, we are classifying traffic based on RT P header identification or whatever you would use, and then the label stack would include an Anycast node segment for Russia. The first thing we do is direct the traffic to Russia, and then at that point Russia to Brussels follows its direct path. So we can get voice traffic to go through Russia and easily get standard traffic to go across the ocean through the United States from Tokyo to Brussels.

Another example it full control on OAM, so you'd be able to do for example pre?test a new path that you want to put traffic on and run some signed of extensive OAM capability, without having to significant that will in a path and create a path and tear that path down, we just push the traffic on the network using a set of strict adjacency segments.

And then this is the one that I mentioned earlier about application controlling and network delivering. The application in this case wants to send traffic from A to Z and they want 2 gig allocated perhaps the link from C to D is full so the shortest path would be ineffective. There might be a bandwidth orchestrator or some other type of traffic optmisation system, a PCE for example running on this application or server and detects there is not enough capacity between C and D, so it determines a path that's available that does have that bandwidth capable an then constructs this via a label stack. In theory, could you push a packet onto the network if you trust the source that have packet with this particular label stack. There is no need to. This is no different than MPLS forwarding today and a service router network where the interface is outside the network don't support MPLS. I don't want people to be alarmed. But if it's an interface or a system you can trust. Then you could push the label stack externally so the router just, the network in this case delivers the packet across the path that you want. And one example of this is selecting an egress peer if you have multipley Greece peers from a down stream node inside of an AS and you want to be able to direct traffic outside the AS to a remote are prefix that's may believe several hops away and you have equal cross paths you can still force traffic to go out a specific interface, that's one example of this being able to do that.

So, in closing, before we close and take questions, some key points. So each engineered application flow is mapped on a path. And just from a graph theoryic I mentioned the fact Tore yell number of paths but that's typical the order number of potential paths in the network. Obviously most of those are useless, but there are much more paths in the network than there are nodes or links and so, there may be reasons to use different paths that we would never want to signal from a State maintenance perspective but there could be utility in using them. As an example. Most paths thee use the shortest path and most times shortest paths an are configured on the distance and therefore the propagation delay or the proximity of a remote node. But you may want to have some traffic not take the latency shortest path but may take a path that's just slightly longer. And so you can construct quite easily longer paths and then put traffic that doesn't require or that's latency intensive on those longer paths and they could be arbitrary, you could load balance them ?? you could ?? in the algorithm it says for long non latency traffic load balance across ten randomly chosen non?shortest paths of the as long as the traffic engineering and optimization algorithm place that is ahead of time using a traffic matrix and simulates the performance of the network and it doesn't require or doesn't incur new congestion in the network that might be a useful capability. But it's never something anyone would have done in the past because you would have had to constructed 10 separate RSVP tunnels and that employees the requirements out the window. There is the capability of doing that. That's just to give you a flavour of what's possibly.

Because a path is expressed as an ordered list of segments and the network maintains the segments there are only thousands of segments but potentially millions of paths. We get this dilution of state. We can maintain that state externally merely removing the state out of the network but we are moving it to a place where it's more appropriate to maintain it.

In conclusion, as I mention, I believe segment routing to be simple to deploy and operate. All you have to do is turn it on and configure the node segment range for your topology and the node segment ID for that particular router. There is two configuration statements, three if you consider enabling it as another configuration statement. It leverages the existing MPLS data plane and again substitute IPv6 if you'd like. Straightforward extensions. Providing for scaleability, resiliency and virtualisation. Create virtual topology is easier and it has great integration with applications because the applications can request behaviours in the network and those behaviours can be delivered without having to create a new state. It's available now. We can demonstrate it and of coarse the IETF were looking for and would be happy to welcome new contributions.

So if there are any other comments I'd like to take them, but that's all I have.

GERT DOERING: Operator. I really like that stuff. I want to have it now, especially ?? I don't mind so much the state because my network is small but I like the OEM capabilities of being able to directly ping a link that's not active right now. That's definitely cool. So which vendors are EFTing it.

CHRISTIAN MARTIN: As of now, I have to be careful ?? I can speak for Cisco of course, I think that the safest thing to say is that by including vendors on the draft that you can expect there to be at some point without commenting on their behalf, there'll be interoperatable implementation this year for EFT, I would think, for at least the vendors that have commited to join the draft, which is Erickson and ALCATEL Luas enter. I can't commit on dates or anything like that but certainly there is ?? the goal is to have interoperatable solution.

AUDIENCE SPEAKER: Google. As you mentioned our group work, we presented on the last RIPE and NANOG and I am very glad to here that probably we can get rid of RSVP and statical SP more monitoring.

A question because I'm not sure I might be probably misunderstood the concept. If I have a bundle with a number of links and I'm trying to construct a tunnel and I use a stack of latency labels, I will a have same set of labels for the different path how does the load balance work? As far as I know it's hard to get all labels...

CHRISTIAN MARTIN: The question is how would load balance work if you had the same set of labels across a path. So, if they are node labels then the load balancing works like a normal hash does. It's more of a local operation in the forwarding table that says use these adjacent links. So no differently than if you have a particular label that's statistically configured or an LDP label you'd load ballet across those links. So that's ?? the leverage is existing ECMP type of behaviour. In the case of being able Tobias traffic using ?? is that what your question is.

AUDIENCE SPEAKER: Basically what my problem was that if I construct using a static SPN, putting in more labels, I need to use, if I'd like to test all paths in the bundle and members of the bundle, I have to create a lot of labels and not hardware is able to look deeper than end labels. So, I'm not sure I actually distributed my traffic across all bundles members because the labels are the same.

CHRISTIAN MARTIN: In the bundle case, I left it out because there is a little bit more complexity associated with bundles, but just to try to be brief. You can create an adjacency segment on all data links on a bundle and you can create adjacency segments for the group of data links so. In the case of an adjacency segment that represents the group, then you would do flow based hashing and therefore you wouldn't get the testability that I think you want. But from an OAM perspective if you want to be able to test all links in the bundle you would have to construct at least for each segment, assuming it's an end?to?end path and you just wanted, there are two nodes for example of eight links between them and ignore the other nodes in the network, let's assume they are one, you would create eight different label stacks, one with the node segment to reach that node that has the bundle, and then each separate adjacency segment for those separate links but you would be able to do that and it's only the label stack that that has to be created. For an application if you have a network monitoring or a probing application it would just have to know to create those eight label stacks.

AUDIENCE SPEAKER: Thank you.

AUDIENCE SPEAKER: Gregory Cauchie, French IT. I like the idea. I heard from Clarence sometime ago so I'm really glad to see not only Cisco working on it. And I wonder what are you about multicast traffic.

CHRISTIAN MARTIN: The multicast question. Of course. If everybody understood the locus of this idea, it is that we're placing paths state intrinsically in the topology. Indeed the graph that's represented by the IGP give us the ability to identify every path in the network. But in the case of multicast ?? so there are only end paths in the network from N nodes, there are really N squared but even that's quite tractable, as anybody here has done any graph theoretical work, in many cases it's not possible to count the number of trees in the network. So, an intrinsic approach to multicast segment routing is harder but because this is an SDN based application we are working on methods that would allow you to programme these segments in a tree format into the topology in the same way that you do in the Unicast sense. So there is work ongoing for Multicast and you'll start to see more of that come out over the next several months.

AUDIENCE SPEAKER: Do you have some papers or some draft you are working on that we can share, read or comment?

CHRISTIAN MARTIN: As soon as we have ?? we are in the final stages of preparing that and as soon as ?? you know, contact me, talk to me off line and we can talk about that.

CHAIR: Thanks.

(Applause)

CHAIR: Next up is Gunter.

GUNTER VAN DE VELDE: Hello everybody. I am happy to stand over here because this morning I was a bit more sick, I am standing up straight for a change.

Let's go for it. So, I think I have been given like about 30 minutes to speak about like two different elements here. One is the BGP next hop attribute and about DDoS mitigation by the users of BGP as such.

The first thing, this actually is a piece of work what we're working upon in the IETF right now. So it is in a draft stage, so also one of my authors here, Randy Bush sitting in the corner over there, so any hard questions, you know ask hem to him, not to me. And also avoids me getting hard questions from him for a change.

So, right now this is something like discussing the IETF, so it's more like, it doesn't really exist right now just yet, but this is an alternative concept of building an over laying network, and over laying networks for both intraAS and inter AS and nice time is it's based on incremental approach and it's distributed. So you don't need to have like a central database with all the information. Everything is stored just like with BGP distributed amongst all nodes everywhere. It also means it's transitive. The nice thing again is that you just, add this particular attribute, somewhere on the Internet, and ignore the nodesed in middle supported, it will just get dribbled through an infrastructure, until it hits the router would actually support it and it can do something with it. That is a nice cool thing. Now the BGP route next hop is the best way you can look into it is it's like a tunnel end port. And the attribute itself, as I will show a little bit later on, is it describes the tunnel inpoint and it describes the tunnel types which are supported by that end point like VXLAN and so on, and we are still working upon the amount on which tunnels are going to be supported in this particular attribute. So the attribute type in BGP is going to be optional transitive.

So, one of the reasons I'm sort of presenting this here now is just to hear some feedback actually about the concept and also if you have, like, any potential, un, usage cases and so on on how you might use it, because you probably are some of your networks and maybe this is actually a technology that could be of interest for you at some point in time in the future.

So, in the intra? AS environment for example, these are the capability of like replacing MPLS infrastructures. So, because of the way I look into MPLS, it's a tunnelling mechanism and by standing over these tunnel end points you have the capability of replacing MPLS as such but without an LDP control plane. That's a pretty cool thing. How does it work, I will try to discuss a little bit.

It's an IP network overlay technology. So that means it's based upon IP. It's protocol in the IPv4 or IPv6, we don't really care that about that at all. It is distributed, so there is no central database who actually will have to store all the tunnel end points for example. It's distributed amongst the Internet itself. As I already mentioned, it can be seen as an alternative for MPLS tunnels within the core and it can actually span beyond the core network itself also. Together with the tunnelen point, we also distribute the types of encapsulation types supported going from these, so on. One of the user case where is we actually see the potential use for this is in a data centre, because for VM, virtual machine mobility, it might be interesting to sort of like sent the prefix around and change the remote next up and by automatically you sort of change the location of VM somewhere in the network, so this is potential user gauge of what this thing could be used for. That's why we support VXLAN. It is transitive. That means not everybody has to support this. Only the routers which would you like to support this you can just implement this and that will be sufficient.

The other thing is also, it doesn't need like a new address family on your router. That is a nice thing. So, for example, if you enable like ?? if you just have a plain IP BGP ship somewhere right?hand it is doing IPv4 exchange F you would like to exchange IPv6, then you probably need to create like an IPv6 address family or like the VPN address family.
In this case you don't need to do that. It is just piggy back upon the routes which are already exchanged so there is no need at all for creating a new address family in your BGP statements. And that is a very powerful thing because that means you can enable this thing at one point to the Internet. Nobody else in the middle support it and at the other end the remote next hop prospective up and is attached to the prefix you have actually exchanged.

If you look in the convergence times expectations, well it will be not faster or slower than normal traditional IP convergence at all because it is serving a different goal.

So, we already spoke about the motivation here, a little bit. Let me go and skip this thing. You can read it afterwards.

So, the draft, what I'm discussing here. It's just an individual submission right now. It's in the RD R Working Group, remote next hop. And actually based upon like some of the other RFCs and drafts, what actually are worked upon in the particular Working Group at IETF.

So, for example, the users sketch what I have here is, just a normal intranet environment, so this is a case of where this technology has its goal for example to replace an MPLS infrastructure. Now, this is just a simple use case here. It could be very well inter AS also but just for the sake of this particular talk, I just use the local network here. So, we have like this the purple thing in the middle, that is the network. And it's consisting out of like lots of routers, some routers you don't see and you also have the edge routers. Which are called like ingressed on the routers, egressed on the routers as such. And then we have like the small yellow clouds there also which represent like customer networks. And then the edge router itself, we have the normal traditional next hops being IPv4 and IPv6.

So this would be like how the BGP remote next hop attribute will be looked into. So you have the prefix is going to be exchanged, this could be an IPv4 or IPv6 prefix. Then you have the traditional attribute attached to, it including, for example, the next hop attribute. And then, you can have this new thing, the remote next hop attribute itself, which is glued upon the prefix itself. And in there, you will see like a few things, and some of those few things we still have to fight within the other thing of this particular idea, what we want to be in there because there are still some things that are not just clear yet. But the most important thing right now is also the tunnel time which is going to be supported by that particular end point. You don't have to give only a single remote next hop to a certain prefix. You can have multiple of them. It means like if one tunnel end point is not working for example, that that you can select a second or third one or you can select a tunnel end point based upon the service you would like to use. So things like that. We have that capability built in here.

So, for example, if you look at this, the bottom router on the left?hand side, then what you have find in the BGP routing table if you would be using this is your destinations, 1.2.1.0 /24 and the other once are there also, and all of these destinations have their traditional BGP next hops attached to it. And then you also have this, the remote next hop attached to it and then together you have the tunnel type, which is glued to it also.

So, for example, if customer 1 A wants to send something to customer 3 A in this case, now, then what people see the BGP routing table is that there is no remote next hop attached to it. So that means if this particular router here on the bottom end here from, this one ?? if it actually doesn't support, so, in this case it doesn't have the remote next hop, so that means it's just doing traditional plane IPv4 boarding. So that means that this particular box here, will sent the packet into the network as a normal IP packet, nothing special to it, normal IP routing, but the thing is that every router here in the middle needs to be BGP also because otherwise this guy will never know where this particular network is so, in this case, the router in the middle of the network need to support, need to run BGP to actually have a normal forwarding situation here.

So this is where the remote next hop could be helping. So, for example, so in this case, the prefix itself actually has like a remote next hop attached to it. And that's what we see here. So, if you want to send something to your destination, the remote next hop would be 2.2.2.2, the tunnel cancellation type is GRE. What do I do in this case? I just encapsulate it to the automatically generated tunnel, I have my packet, my GRE encapsulation and then my tunnel encapsulation destination address would be this guy. Now, the difference here with the previous case is that none of the routers here in the middle of the network need to know this particular IP address. They only need to know the next hop IP draws because I do the tunnel encapsulation. So this is the same situation as if you would have MPLS environment. So by using this you actually have the capability of simplifying your backbone infrastructure by not having to run BGP on every single box on your infrastructure. And in this case, you don't run LDP at all. It's an IP overlay technology.

So, another user case here, is, so in this case, so what is its trying to say is that the are remote next hop doesn't have to be the same guy as this particular box, it can be like an another box which this guy announcing this particular prefix thinks is a very good tunnel end point. So there is a decoupling there because the guy announcing the route doesn't necessarily have to be the guy receiving the packet itself. So it's decoupling of the remote next hop and the next hop itself. So that's a very powerful tool also in case you want to have like multiple tunnel end points.

This is just showing how the encapsulation will work in a similar way as it was before. So, I think this is doing the same thing but on the tunnel encapsulation type for VXLAN instead of like GRE. And also there is a decoupling of the address family used here in comparison with the address being exchanged. So the address exchanged is plain, pure IPv4 here. Now the remote next hop, in cases would be like an IPv6 environment, I am dreaming, but it will exist at some point in time, then the remote next hop will be just a plain IPv6 address and what you will see is you have the capability of sending and IPv4 packet over an IPv6 only infrastructure towards the end point and here it will be IPv4 and IPv6 again. Again, it is sort of like doing what MPLS has the capability of doing but then without the need of an LDP control plane at all.

This is the same thing as it was before but it's stead of using IPv4 and IPv6 encapsulation it's using a GRE encapsulation.

So that's actually what I have to say about the idea of what we are playing around with right now about the BGP remote next hop. It's a work in progress. I think it has some potential of you know doing some real good work, you know, going forward for innovative network technologies as sufficient. One of the key elements here I think is that it is transitive, and it doesn't really need the support for on all the routers which running BGP on the whole Internet. So it's incrementally deployable and if you have like a bug support, if you bug doesn't ?? you don't use it but you still forward the capability onwards. As a ships in the night mechanism here. It is address family agnostic. What you can do this this case, it can be also for example, be integrated in the RPKI database, so that you know, if, for example, if you certify a certain route, then in that case you can actually say this route would for example be certified to also have only this particular kind of remote next hops attached to it. So that is future work, you know, things we are looking into how to actually enable this functionality as such.

It can use the existing fast convergence technology what we have in IP networking right now. And so that is actually what I have to say as such.

So any questions or any reflections or...

AUDIENCE SPEAKER: Alex Morris, haven't you just reinvented LISP.

Not really, because LISP actually makes usage of a central database. The other thing what is happening there is that LISP is paced upon a pull model and this is a push model. With a pull model everything is in the database and then the nodes uses LISP request the information from that particular database in this case the information is already available at the edges. So it's a big difference complete difference.

AUDIENCE SPEAKER: Randy Bush, IIJ. Clearly you don't know me very well. This essentially allows the signalling path to be non congruent with the data path. And every time I have done that, I have gotten bitten in the ass. It's hard to debug and hard to really understand once it gets very rich. So, there is a real trade?off here and it's a little dangerous.

AUDIENCE SPEAKER: David Freedman from Claranet. This is an individual submission by yourself, where I believe you are the only author on the draft, is that correct?

GUNTER VAN DE VELDE: Randy is another one.

DAVID FREEDMAN: I think the real question I have is has this been designed with any constraints from potential implementation in mind?

GUNTER VAN DE VELDE: So, one of the elements actually within the draft itself is the use cases as such, so, we have a few of them and we are still trying to spin them out in some sort of a way. There is ?? so one of the later use cases we are looking into it data centre environment where you have mobility of VMs, where this thing can be used in the VXLAN environment. The work itself is, it's not very thick on that one, it's ?? that is a work in progress, stuff we have to put moreings attention into.

AUDIENCE SPEAKER: Blake with L33. Given the widespread adoption by lots of vendors of RFC 5549, which is pretty much just v4 on the IPv6 next hop or vice versa, do you see additional incentives for this to have a lot of traction in a multi?vendor inter op environment or is this mostly a Cisco thing?
GUNTER VAN DE VELDE: The reason this thing is putting on the ?? is basically not to make it a Cisco thing as such, to actually make it open.

CHAIR: Thank you. Do you feel strong enough to go ahead with the second.

GUNTER VAN DE VELDE: Next set of slides. What I'm going to be speaking about here just now is some things that you probably already are doing in your network infrastructures right now at this point in time. So this is not going to be rocket science. But what I'm going to be discussing here in my presentation here right now is to explain like two small little hacks, it can be made operationally more simple as such.

So, I think in the interest of time, let me go pretty quickly. So basically with a what we have in a DDoS environment here is, so if you are stuck in the DDoS and if the customer is already getting all of those back and all those evil DDoS packets that the interface here has a potential of having lots of bad stuff with only very little good stuff. And in that particular case, the damage has already been done because the customer firewall will not protect you or that particular customer at least from this link being congested. So what traditionally happens is that the customer here goes towards the service provider and says please help me dropping this bad DDoS attack here because it's congesting my link. Please help me on this one.

Now, in the old days what was happening is that the service provider yeah, no problem at all, I will just block all traffic coming from ?? I will just block that particular flow at the edges here. And didn't make any difference at all if the traffic in there was good or bad itself. So, the next slide was to actually instead of blocking everything, to actually mitigate the flows itself towards a scrubber machine and that the scrubber machine will actually look into the flows and separate the good traffic from the bad traffic itself.

And that is what we're seeing here. So, in a traditional environment here, the traffic is very happily flowing from in here towards the customer, and what is happening with all the flows, we have NetFlow data here going towards DDoS analyser, it is looking into it.

So in this case, we have an environment where all traffic is good and there is no real DDoS attack at all.

Now at some point in time, the flow becomes bad, so there are some hackers at work here. And the DDoS analyser figures out like, there is something happening here in my network, I see, you know, in the flow there are some bad signatures here, some bad elements and it actually alerts the security server, and what is the result of this security server here? Is that it actually says like, okay, instead of sending the traffic directly towards the customer here, send it towards the scrubber, filter it, so, in essence remove the bad stuff from the flows, and send it onwards, and send on the good stuff, the clean traffic onwards towards the customer. This is probably something you already have implemented in your infrastructure.

But, what you traditionally have is some trouble with how do you mitigate in a very simple way the traffic which was originally going here, to go towards the scrubber and then the more harder thing is how to get the traffic from the scrubber towards the original destination? Because right now, what some people are using, they are using layer 2?VPNs, they are using manually created tunnels, back?to?back cables, they are using voodoo art, whatever, lots of difficult things implemented in the network infrastructure. What I'm presenting here is a way to sort of like solve this in an operationally very friendly manner. By the way I have this thing running in my lab right now, on six different routers running XR, so actually have this simple mechanism you know available, so if you would like to have a quick look with me after the talk, I can go into my lab and actually show you.

So, what is the concept? We actually, more explanation how it should work, we are the Internet user here and it goes from the service provider it goes towards the ISP to PE 2 to the server, this is if there is no DDoS attack at all.

So, the server says, the customer says my flow is under attack. Originally, what we call Phase 1, the PE just says okay, the server is under attack and block traffic here and boom, DDoS attack on server is stopped. Now at the same time the flow is stopped all together. You stopped all traffic, there is no differential at all. Better would be to do the following: To mitigate the traffic from the Internet towards the scrubber then clean it and send the clean traffic onwards to the server. So that is the master idea.

So, what do we normally have in a service provider environment? They tend to have like a route reflector. And normally all of the PE boxes they peer with the route reflector and this route reflector is doing reflection for both IPv4, IPv6, VPNs, all that sort of stuff. And that is being set up in this environment already.

So we also have this routing table here, so on PE 1 what we see is like, server 1.1.1./32. Next up is this particular guy and this is the idea. And also this scrubber is in essence a very simple device. It has an incoming and outgoing interface, that's the best way of looking into it at this point in time. The scrubber tends to be not a Cisco device mbut a third party. So in this case, what we see is that the flow itself, before it was actually blue, it now becomes like a red, so there is some malicious stuff being detecteded into the flow, and what is happening is by the security server, it says okay, so for this thing to work you need to install like an additional route reflector, and that route reflector peers with all of these edge boxes. Now, on these edge boxes, imagine that we're running BGP before, now, you actually will have to create like a second BGP process on all of these boxes. So, this is something which was not possible before. So, with this DDoS mitigation, a trick what I will show you here, that actually goes from the assumption that you can create a second BGP process on each of these boxes. The reason why you need to create a second BGP process is because for the mitigation route itself. Because any route which is in the second process will be of higher priority at any cost than any of the routes either clash from the original process. And that is the way how you get in a very simple way, your traffic from the original destination to the scrubber.

Let's look at how this is work in this case you say okay, traffic going to 1.1.1.1 is under attack. I have this route on this particular route reflector saying send traffic to 1.1.1.17 now the next hop is going to be the scrubber, that is you what you see here. The original prefix is this, 2.2.2.2, which is the second guy. The second route on this box going to this destination, the next hop is going to be 2.2.2.3.which is going to be this guy. Now of course, that's a problem because in my routing table I have two entries, which is one do you select? I have like one entry coming from the original BGP process and one entry coming from the second BGP process, my DDoS instance process. And that is why you need to have a second process, because the second process is says, if the route address ?? it will always be top priority, like it will always win. So your routing table it says okay, on this guy, if you want to send the packet towards 1.1.1.1, send it towards the next hop, 3.3. 3.3.

My traffic is being sent from Internet towards the sub scrubber. The scrubber cleans the traffic and sends the traffic back towards PE3. This will have the same routing table as here, saying if he gets traffic and needs to be routed towards 1.1.1.1, what you have is you have a big magical loop. That's where your clot actually ?? the reason why you have to create tunnels and back?to?back cables and so on.

What you can do in this case is you short of move away from it, so, all this interfaces are global interface, not really VPNs but what you are going to be doing is you make this interface like a VPN interface, and in that particular VPN, you have like all of these destinations pointing towards the original destination, so 1.1.1.1 will in the VPN pointed towards 2.2.2.2. So the traffic comes in you look towards 111, no problem it's attached to PE2 and you fall off the traffic to watch PE2. Now in PE2, it will see the same thing like destination, 1.1.1.1, it's in my VPN clean, send it towards CE1 and traffic actually gets forwarded towards CE1. Now, the big problem here is of course how does the traffic get into the VPN itself in well this is why I actually have to use strict number 2, from the moment you provision PE2 by saying the original destination 1.1.1 is over here, you actually copy it directly with the special command into the VPN itself and by the same thing, you also copy all the enable ship details in here, so if a packet hits PE2 going towards 1.1.1.1, it will be sent out on the global face towards the destination.

So that is it and for the moment is attack is stopped. Just remove the route here from the DDoS route reflector and all traffic will be started to flow as originally as such, so I think I have overrun about one minute, and I see Rob looking at me very angry, so I'm not going to ask for questions. I have this thing set up in my lab, so if you want to have a look into it, then...

CHAIR: That's probably the best approach. We do have some certain time limitations today but this was already in the edge but it was worth showing to people today. So thank you.

So, now, we would like to see Daniel.

DANIEL KARRENBERG: Hello. This is five minutes question for feedback and guidance. And I'm going to go really, really fast.

In the Plenary we saw this presentation by Manish about IPv6 DarkNet, the studies. And one of the things that came out there is that we permitted them only to use different routes than all the other regions and I want tomorrow feedback on it. So this is a community guidance question.

So, what happens schematically this, we have ?? this is the /12 that the RIPE NCC has to allocate further and these are the allocations and it's not really like this, it's just a show a principle. If you want to see how it really works, you use RIPE Stat and you get this and you see that the whole thing is really very sparse at the moment still. And this is only one?fourth of the total. So this only shows one?fourth, the /14 of the total space that we have at the moment.

So, what was originally proposed was a /12 overlapping all this, so that would siphoned off any traffic that didn't have a specific route in the routing table here and actually DarkNet in this case is misnomer because it's actually a route that includes prefixes that are actually allocated. So some people, including Wilfried, complained about this, and quite rightly so, because it just doesn't ?? it might be that some allocated space here is actually siphoned off because there is no route for it. So what he we did is we said okay, please change this to a /14 plus a /13 because our registration services people told us well this is the first /14 is actually what we are allocating from right now.

So, there are a number of alternatives how you can do this. There is the /12, it collects the maximum traffic for the unannounced routes. But it may collect some traffic to allocated addresses. If no routes are announced for those addresses, and there are some privacy concerns with this, you know, with giving this traffic to merit and DHS, and some operations concerns about you know, if we want to test what happens if there is no route and stuff like that. It's what the other RIRs permit, and so if we would do this, the results would be directly comparable with the other regions.

This is what we're doing right now. There are no operational concerns and probably less privacy concerns. But much less traffic is collected than the other RIRs /12, that's what we saw in Manish's presentation earlier. It's basically not much coming in from this dark part of the address space.

And this is what we currently do as I said. There is another possibility of course is to use even more fine grained routes, so basically look at this left space here and add it up with some more routes that are in between the allocations. That's a possibility. But that's very complex and the longer these prefixes get the more danger we have to cause operational issues if we are sort of out of sync with what people do. And at some point sort of the cost benefit becomes questionable.

And of course, there is a fourth option, which is to terminate all participation in this study altogether. So, these are the alternatives, you know, maintain a status quo, which the researchers tell us is not really useful, it has some use but not really useful to go to the /12, and accept the operational and the privacy problems. To add more complications, you know this is the one I personally like the least. Or to stop with it all together.

So, can I see a show of hands who has actually seen this slide back on the mailing list before? Not many people reading the mailing list. So what I'm asking for really here is guidance, what should the RIPE NCC permit merit to do, what should we permit merit to do in the RIPE NCC acting for us, so, please can we have some feedback here, and also of course it doesn't count if it's not on the mailing list, and then my other question is you know. If we don't find a consensus, what do we do, maintain the status quo or terminate or participation, whatever?

WILIFRIED WOEBER: Just a very brief question: What is the basis for the statement that the current situation is not useful for them? Because in my point of view it's already an interesting thing to learn that the dark traffic distribution obviously seems to not be evenly distributed across ??

DANIEL KARRENBERG: This basically the presentation in the Plenary is about first results. It was a very rough presentation. And the statement there was there was orders of magnitude less traffic in those /14 plus /13. That's already a result, yes other feedback?

Can I do a straw poll?

So if we look at those, and this is not a vote. I just want to get an impression here. If you look at this, who would prefer to keep the status quo?
Who would go to a /12?
That's about the same number I would say.

Who is for more fine grained?

Hey this room has sense.
And who would sort of terminate this all together?
Okay. Those are less. Well, thanks anyway. Please comment on the mailing list, and give us some guidance on what to do. It's probably if it remains like this, like the same, about the same amount for /12 and status quo, I think we'll stay with the status quo, that's at least what I would propose.

And thank you for your attention.

CHAIR: Thank you Daniel.

In any case I think we should poke the mailing list and have a discussion on that.

With this we get to the end. I don't think there are any AOB items that were brought up to either Rob or myself before, does anyone have anyone? So, that will be it for today for the Routing Working Group. As I said before, the results of the RIPE board elections and all the last items from the members meeting yesterday will be announced in the room next door starting at a quarter to the hour. Thank you.

(Applause)

Connect with RIPE 66:

Local Host

Sponsors

become a sponsor