Archives

These are unedited transcripts and may contain errors.

DNS Working Group

Tuesday, 15th of May, 2013, at 2p.m.:

DNS session

15th of May, 2013, at 2 p.m.:

Jaap: Good afternoon. I am one of the co?chairs and we will continue with this session, and the first will be the report from RIPE, a bit unfortunate that the scribe taker is actually reading his own report but he will remember what he is saying, I guess.

ANAND BUDDHDEV: Hi, good afternoon, everyone. I am Anand Buddhev of the RIPE NCC and I am here to give you a short update on the DNS services at the RIPE NCC and any new developments. So the first thing I am going to talk about is K?root. The RIPE NCC operates one of the root name servers, K?root, and we have been doing that for many years. The service is running stable. We have 17 instances of which five are global and 12 are local. When I say "local" I mean that we announce the K?root prefix from there with no export set, so that only peers of that particular exchange see this prefix.

Our query rate at K?root is approximately 20,000 queries per second on average, on some days it's higher, up to 30, 35 or so, but yeah, that is the average, about 20,000. Most of the information about K?root, some notes about its architecture, graphs of all the queries that we get there, summaries of these are published at the URL k.root?servers.org. So please feel free to brows this website, there is lots of information available and if you have questions, you can always e?mail us.

So our K?root network is quite stable and we have no plans to expand the traditional network that we have. However, we have been considering the idea of a K?root member instance. This is still an idea. We haven't actually deployed anything, so there is nothing out there. We are still thinking about the best way of doing this. The idea is that we would deploy small single service K?root instances and these would probably be located inside the networks of RIPE NCC members, so there is lots of details about how the rooting and stuff like that would be handled. But as I said, this is still an idea so we haven't actually made any decisions anywhere.

We are considering deploying this inside a virtual server, so, for example, we might decide to deploy them inside a server where we also have an Atlas anchor, for example. So, you will probably be hearing more about this later this year. We are still thinking about this idea and any input of course is always welcome. So if any of you have any particular ideas about this, then please feel free to talk to us, contact myself or our community builder Vesna, or any one of the other RIPE NCC DNS operational people you see here will take all your feedback and take it into consideration.

The RIPE NCC also operates other DNS services, so we have servers where we host the ripe.net zone. For example, and several others. We have a cluster of DNS servers for this. We have just over 5,000 zones hosted on this cluster. As I mentioned, we have ripe.net, we have the ENUM zone, e.164.arpa on this. We have several secondary zones for our RIRs as well as the RIPE NCC's Reverse?DNS zones on it. We also offer this service called NS.ripe.net where RIPE NCC members can use our service ?? our server as a secondary DNS server. This is also hosted on this cluster. We also secondary several ccTLDs, about 76 of them, as well as some of their IDNs. And then we have a whole bunch of several miscellaneous smaller zones that we have on this cluster. So this is actually quite busy, this box or this cluster. We usually see about 120,000 queries per second here on average at busy times this can go up to 180,000 up to 200,000 queries per second so this is a really busy box or service.

So, you know, traffic is going up and the number of zones are going up. People are also signing their zones so actual zone size is increasing. And so we have some plans for this service; we want to add a third site this year. We haven't yet decided where we are putting it. We are talking to a few people about this, so you will probably hear more about this at the next RIPE meeting.

As I mentioned, zone sizes are increasing, so we need more memory to hold all these zones so we are upgrading all our servers, adding more RAM to them, faster servers. And the other interesting thing is that this cluster currently runs BIND 9 and we are quite keen on diversifying our infrastructure, so we have been looking at NSD 4 and not DNS for this cluster, we have been playing around with them and we plan to deploy them later this year as soon as NSD 4 is released ands in no is released with some features we have.

The RIPE NCC also signs all the zones it is primary for. We currently have 125 signed zones so this is ripe.net, all the reverse zones, E. 164.ARPA, these are all signed.

For all these zones, we have chains of crust leading all the way up to the root zone. Three of them are still islands of trust because their parent zones have not been signed. So, as soon as they are kind we will also be submitting DS records, but for the time being they are still islands of trust.

We had a KSK roll?over event in November of 2012. I am happy to say it was quite uneventful. No glitches this time, and so we are quite happy with that.

Just a small number: We have, in the reverse tree we have 787 secure delegations, so these are zones that also have DS records, in comparison with the last RIPE meeting, this number has has gone up a little bit. The trend has been for this number to go up, so we are seeing more and more people sign their reverse zones but it's not a terribly sharp uptake in DNSSEC in the reverse tree.

Those of you who keep a close eye on domain objects in the RIPE database might notice some interesting name servers. They all start with the string UZ5, and these are people who are probably trying out DNS curve. I don't know what they have done with it or what their intentions are but these name servers do appear so just an interesting observation.

Just a quick update on the ENUM zone itself, because we don't have a separate Working Group session for it. We have 52 delegations in the ENUM zone so there have been no changes since the last RIPE meeting. Of these, six of them are signed and have DS records in the ENUM zone. Again, no change from the last RIPE meeting. And just a data point: We have about five queries per second at our server for the ENUM zone out of the 120,000 or so.

DNSMON, this is something lots of you have been asking us about. The RIPE NCC has been operating the DNSMON service for many years and this is based on RIPE NCC's TTM infrastructure, the test traffic management infrastructure, and as you may have heard the TTM infrastructure is scheduled to be turned off at some point later this year, and one of the services using it is DNSMON so we are developing a new DNSMON that will replace the old DNSMON based on TTM, and this new DNSMON is going to use the Atlas probe infrastructure to actually do the measurements. We have several Atlas probes out there in the wild; we have about 3,000 of them now, and we are also deploying something called Atlas anchors. These are stable Atlas probes in the sense that they are not these little probes that people take home with them; these are actual physical Del servers deployed in networks with good connectivity and good visibility and they run Linux and they run the Atlas probe software on them. So they have much more capacity to do lots of measurements and have much more disc space. So these Atlas anchors will provide fixed stable set of vantage points for DNSMON's measurements and we will complement that with measurements from a random selection of user probes, so that is the idea we have for the new DNSMON.

The other interesting thing is that the old DNSMON doesn't provide the users with any kind of control panel so users have to e?mail us and we define the zone and servers and IP addresses to measure but for the new DNSMON we are going to provide a control panel for the user and the user will be able to define their zones, their name servers, IP addresses, query types and all that kind of stuff themselves and will be giving all the control over to the user.

Along with the back end changes that we are making using Atlas probes for measurements, we are also going to make changes to the front end, the graphing engine, it's not going to be terribly different from the old DNSMON, it will look pretty much the same but it will take into account the fact that we have stable Atlas anchors and then a random selection of user at loss probes and so there are changes being made to the UI to accommodate this.

Raw data from all these measurements will also be available to the user. Atlas data is already available to users. If you have a probe and you have measurements that you have defined, the raw data from this is already available to you as a user, and in the same way it will be available to DNSMON users for them to consume. Because the data format is going to be different, any users who are using the current raw data from DNSMON may have to adapt their scripts and programmes to use this new data format.

We have a roadmap published for the new DNS MON, the URL is on the slide. Please feel free to take a look at this. We are planning a pilot in the third quarter this year, we will do this pilot with a small subset of existing DNSMON users. And if all goes well, we are planning to take this into production in the fourth quarter of this year. At this point, all the existing DNSMON users will be migrated to the new infrastructure.

I'd like to stress that the current DNSMON is still running as a service. We haven't turned it off or anything. And that will continue all the way until the migration to new DNSMON is complete.

I have a few slides as an example to show you what the new DNSMON control panel will look like, so this is still work in progress. If you spot any kind of spelling errors or any other errors, please ignore them for now; this is still being developed. So in the first step, DNSMON user would enter the zone that they would like to monitor, in this case you can see somebody has typed S E for Sweden. And then they click on "next" and the user interface goes away and looks up all the name servers for this zone and provides a list to the user. And the user can choose to monitor all of them, the user can remove some of these servers from the tests and if any name servers are missing, for example if you have an undelegated server that you would like to test for example, you can add it to the list of servers to be monitored.

In the third step, the user can define the types of measurements that they would like to do, and you will notice that the measurements are available over both UDP and PCP so that is another big improvement in the DNSMON. Support for NS ID, so the user can select their types of queries. One interesting measurement available in this control panel is trace route, this is something users keep asking us for all the time, if they see anomalies in DNSMON they would like us to run trace fruits a particular box to some destination and they will be able to define these under new DNSMON so they already have this data available for when strange events occur.

And finally, the confirmation box, the user gets a chance to review all the set?up, the query types and the name servers and if all is well click "confirm" and this goes away into the Atlas infrastructure and it starts doing measurements and you start seeing graphs. So, nice and shiny.

That was it. That was all about our services. I am happy to take questions now.

AUDIENCE SPEAKER: Sebastian. How the DNSMON is going to interact with the credit system in the Atlas?

ANAND BUDDHDEV: That is a very good question and I am not sure about that. Perhaps Robert can then a question, he is right behind you.

Robert: The current plan is we are going to allocate a bunch of credits to all the DNSMON users, large or small same credits or different amount, we don't know yet, that is to be decided and could you go back two slides, probably. Yes, on this panel you will be able to tweak your numbers so if you say I want to use more probes but less frequently, that is essentially the same amount of credits from the systems point of view so you may have more control over what you really want to do. If you want to use the highest frequency with the highest amount of probe that costs you more but you want to do less servers it's in your hands, that is the idea.

AUDIENCE SPEAKER: Peter Koch, DE?NIC, no hat. That is probably more improvements to come, what is the way to discuss or contribute to this change, is that the beta programme with the pilot you mentioned.

ANAND BUDDHDEV: Yes, we have DNSMON users mailing list that you could join and contribute to, you can also send your questions and feedback directly to us. I think there is an e?mail address on the road map URL here. And if you can't find it just come talk to us, we will give you the channels to contact us for this.

AUDIENCE SPEAKER: Thanks. And if I may in a follow?up question more or less: I am a bit split here. As customer using this for like our own monitoring of our services as I said, that seems like a very good feature to add. On the other hand, one of the motivations to have DNSMON was to monitor independently parts of the infrastructure. Now, when different systems are monitored with a choice or selection of different parameters, this might no longer be comparable. Any ideas about that?

ANAND BUDDHDEV: You mean different users will be using TCP and UDP queries, some may not use them? Is that the kind of differences you are talking about or ??

AUDIENCE SPEAKER: Assume I was the balanced marketing person that we all have in our mind here, I would say, well, I know I have the issue with this, one of our name servers, so it's just pull it out of the monitoring. Or probably with less hostile intent just by changing the check interval, that kind of influences the granularity of the monitoring and therefore maybe even this complicated scheme of green yellow, orange, red...

ANAND BUDDHDEV: Right. Yeah, I am not actually sure whether we have talked about this with any individual user, like you. So this is certainly interesting feedback for us. Perhaps to do with graph presentation I think. I see Robert coming up.

Robert: Are you really saying that not providing controls to the user is actually a good thing?

AUDIENCE SPEAKER: Again have I stopped beating my wife? I won't answer that question. As I said, I am split, there is two aspects of that: As a user it makes perfect sense if I have a test server to add or some known issues with probes or links that is one thing, but the roots of DNSMON being this community tool, are slightly skewed there, maybe we can take that off?line but...

AUDIENCE SPEAKER: We are very interested in the data about DNSSEC because we have our own interpretation based upon our data but sometimes when you do theoretical math it doesn't necessarily translate to what customers zone actually turn into. Is that information that is available publically?

ANAND BUDDHDEV: As far as I know, the Atlas probes are not doing any DNS specific measurements at the moment. They are able to set the DO bit in outgoing queries. But so we are not doing any validation with this data and as a user you would be able to download the raw data which would have you know the raw packet lumps in them.

AUDIENCE SPEAKER: I am specifically interested in the RAM utilisation on the servers themselves for your DNS servers, you specifically mentioned earlier in the presentation that you were beefing up the RAM. Is that data that you have ?? that could be available to the public or at least if I approached you.

ANAND BUDDHDEV: Perhaps we are talking about ?? are you talking about our DNS cluster?

AUDIENCE SPEAKER: Yeah, sorry, I changed, I jumped around. Sorry. Go forward one more slide I think it was.

ANAND BUDDHDEV: This slide?

AUDIENCE SPEAKER: That second bullet, we would be extremely interested in the data you have for that.

ANAND BUDDHDEV: I am happy to talk to you individually about this. We don't have any hard numbers. We have people coming up to us and saying we are about to sign our zone, are you ready for it?

AUDIENCE SPEAKER: You are in the same boat we are.

ANAND BUDDHDEV: Yes, we always say hold on, let's do some numbers first and see if we have the capacity, because if everyone suddenly signs their zones, our servers will not be able to load them off.

AUDIENCE SPEAKER: I have find you off?line, thank you.

Jaap: Thank you, RAGNAR ANFINSEN.

Jaap: Carsen and/or Niall talking about what happened. They are both going to talk in harmony.

NIALL O'REILLY: We will do this polite dead lock thing for five minutes, and then decide about the talk. This time around we went to the ?? we are not having an ENUM Working Group session at this RIPE meeting because our invitation to the Working Group to submit agenda items drew no offers. But there are one or two things to report, routine stuff, and I am very grateful to the co?chairs of this Working Group for making some time available for us.

You have had included in Anand's just finished talk, you had the routine ENUM item, and there is another routine ENUM item for me further on, but as far as the administration of the Working Group is concerned, it's not yet decided what is in the future. The mailing list is still there. If there are things that need to be discussed, people can open up the discussions on the RIPE mailing lists and if we find that we have agenda items being offered for future RIPE meetings, then we will negotiate with the other Working Group Chairs for agenda space, either in their Working Groups or in some other way during the course of the week.

So, the public ENUM statistics from Enumdata.org and if you don't know about that, you can visit it in your web browser, it's an initiative that was started a long time ago when Kim Davies was one of the coy chairs of this group. And we are seeing those numbers, you will see that there is an accounting error because the numbers in the far right?hand column don't add up to zero. And when I went looking at this I discovered that there was a counter error in the previously presented statistics and I haven't finished all the reconciliation yet but the numbers are so small it really doesn't matter and the significant change is that Yarla has arranged for us to have a data update from Norway. So there is really no new activity here in the ENUM golden tree, but the Norwegians have told us what they are doing.

One place to watch is the NRENum initiative that is going on in Terana and this is national research and education networks doing ENUM among themselves in, not in a walled garden but not in a completely open public park either. And they are seeing quite some significant growth, they have seen a growth in the number of delegation with DS since the last RIPE meeting and seen a big growth in the number of delegations as well. From whatever 31 minus 13 was, they have seen 13 new delegations and seen one new assigned delegation and to find out more about them, you can visit their website and you can see that they have got quite a large number of countries participating in their project. In at least one ?? four continents or five, something like that, Australia, I suppose, is a continent on its own. So that comes up to five. And if you want to learn more, take a look at Enumdata.org.

If you are involved in ENUM in your country and what we have announced there doesn't match the reality, please let me know by e?mail to ENUM Working Group chair [at] ripe [dot] net or to Niall.O'Reilly at UCD.ie or Carsten's e?mail address. And special thanks to the RIPE NCC who host Enumdata.org and they also host e.164.arpa, but that is under a separate agreement and to this Working Group for the agenda time. That is it. Thanks, folks.
(Applause)

AUDIENCE SPEAKER: I want to ask him a question. Since he gave such a nice silent presentation yesterday, since he is opening his mouth I thought I would ask him direct questions.

NIALL O'REILLY: I might give you silent answers.

AUDIENCE SPEAKER: Don't poke the bear because we could enter into an entire thing there.

At this juncture do you see any future for ENUM in Europe or do you think it's a dead duck?

NIALL O'REILLY: It's pretty dead. I don't know whether it's in a deep sleep or whether it's more ? I do know of two prefixes which seem to be working properly, and doing something useful but they are very exceptional in the way they are organised. These are a one ?? one of them is a commercial offering to businesses who want to avail of that service. I don't know the deep details, but it's run by a small firm in Brussels and the other one is the UN office for the coordination of humanitarian assistance who have outsourced their voice requirements to the same small firm in Brussels and the key indicator here is that this small firm is essentially in control of an e.164 prefix, and is in control of offering the service and is free to choose its own technology and it happens to use ENUM and they have open zones so you can transfer the zone and see what they are doing, it's very clear. The other thing that is worth noting is this train in a project, they have got lots of countries in there, they think they are doing something useful with ENUM and I think it's a bit like the early days of e?mail: The academics and the researchers were doing something useful with e?mail and other people said hey, there is a business there. Now, if the people in Terena and beyond that, because some of those countries you saw on the map aren't in Europe, can demonstrate there is a use case for ENUM, then it may not be moribund, may just be deeply asleep.

AUDIENCE SPEAKER: Just following up on that. OK here in Ireland we have the ENUM 353, the UK I think NomiNet was running it and various domain registries, mainly ccs took up running of the ENUM services in their various countries. Do you think that that might, be part of the problem, because a lot of ccTLD registries aren't really run as businesses? I mean, they don't have the ability really to kind of market themselves from a business perspective, yet are quite happy to pore money into technical research.

NIALL O'REILLY: Yes, I don't want to steal more agenda time than Jaap and his co?chairs may have allowed for because we could have quite a long discussion about this, but I am happy to begin the answer. And if he says enough, we will talk about it in the bar. But what I think here is that one of the key things is that if I go to you and say I want you to register a domain for me with some country with some top level domain, you will say fine, it will cost you this much and it's your domain and I now go looking for service providers who can offer me services based on that domain, it's the other way up with phone numbers, I don't own the own number, it's a side effect of having a voice service contract with some telco. I can port the number somewhere else but if I stop paying for it I lose the right to use it and it's a big difference between a telephone number and a normal domain name.

AUDIENCE SPEAKER: Not entirely true. You are forgetting about premium numbers which aren't really available in this country but are in a lot of other countries. You are forgetting about those ones where you would actually buy a cool catchy number, there is actually a market in that in the US. And elsewhere. Where you actually registering the number, and that is actually your property, it's like your intellectual property. 1800 Niall.

NIALL O'REILLY: The 353 space is where you started asking the question from is where I am looking at and I am not sure those are the same contexts. There are only two ways that I see that ENUM can watch on, and one is that we recognise that the subscriber owns the number and I don't see that happening really fast because that is not the way the market is traditionally organised, or the other way is that the number authority, the regulator decides ENUM is important for us for some reason or other and we want to see it pushed. Neither of those is happening quickly in this country and I think also not in the other eight or nine countries that have announced production ENUM status.

AUDIENCE SPEAKER: Thanks.

AUDIENCE SPEAKER: Richard Barns ??

PATRIK FALSTROM: If I say 388 what do you say? What has the Commission done or not regarding the European area country code that actually I think would be absolutely perfect to use ENUM on top of all the numbers that we already have?

NIALL O'REILLY: Isn't that 3883 EUUE?

PATRIK FALSTROM: Anyway, so I screwed up the number, maybe I should have ENUM lookup first before going to microphone.

NIALL O'REILLY: I would agree with you, it would be a nice opportunity ??

PATRIK FALSTROM: So this is one indication, for example, the Commission and others do not participate enough in RIPE and Internet discussions?

NIALL O'REILLY: You might say that; I couldn't possible comment.

PATRIK FALSTROM: Good. Thank you.

NIALL O'REILLY: It's a cultural reference, some people in the audience will recognise it.

JIM REID: Bearing no hats. I just want to go back to the question about whether it was a good idea for the country code, domain registries to be involved is the tier operator for a country, the reality is I can't see any other obvious home for it, you need some trusted third party that can deliver reasonable service and broadly speak, the only place we can find that for registries and services and DNS is TLD registry. Where I think ENUM has failed is to do with other stuff, not so much in the registry or who is running it itself but in all the other regulatory goop that goes around with authentication of phone numbers and requirements phone companies put on things and regulators and other complexities have come around that and at the same time the problem with the registrars trying to figure out what is this ENUM thing that makes sense to them and convince them it's a good idea to buy the number or registration for it or get it bundled with some kind of voice package or whatever. The business case hasn't been proven and it's not right to blame the registries as part of the problem. If anything they have done more than enough trying to keep this dead horse alive.

RICHARD BARNES: I have one point on which I have some data and one point on which I have none. Point number one, so in order for these ENUM prefixes to be useful or delegations to be useful they have to be supported by clients that are using telephone numbers. And so there is these SIP intra /PRABLT events that get tested a couple of times a year and the last data I found looking through the reports on these in the last few years, as of a couple of tests ago only 20% of the clients that showed up to the tests supported ENUM so it's a level of interest in the implementer client community is pretty low. That is point one.

Point number two: It may be that ?? there is a distinction made between public ENUM or private ENUM, we are looking in this meeting at the public half. There may be some usage of in private half, so it may not be that ENUM is entirely dead if the public, maybe some other bits of it lurking around in walled garden networks.

NIALL O'REILLY: We don't have the means to see in those walled gardens so we don't know how much it is rampantly in use out there but the focus of this Working Group has always been public ENUM because precisely of the NCC's role in operating the tier 0 service.

Jaap: Before you start to be ENUM Working Group. And we probably hear some more about whether or not ENUM is going to survive again. Next one we have is, we just saw Patrik Falstrom. And fun with TLDs, it says, the title changed.

PATRIK FALSTROM: Thank you. I am from Netnod but I am here on the stage as the Chair of the security and stability advisory committee of ICANN. This is a presentation that explains a little bit about the one of the issues that we have found regarding allocation and delegation of new TLDs. The actual work was done by warn /K*UPL, this was mostly his slides, on the other hand I have added to them so all the errors are probably things that I have added.

So, if we look at the background for our report number 57, we have to look a little bit at certificates and how they work. So if you look at normal access using the HTP protocol using SSL we have it requires a public /KAOERBGS it's carried in a certificate, etc.. (key) and that certificate that is presented by the web server is something that the owner of the web server or the one that administers them do acquire from a CA or a certificate authority.

This, the public key binds the key to an identity, and the browser is looking at the content of the certificate and that is used to secure the communication to make sure that it's talking to the correct server. So, what is happening when this is presented from the server is that the client is doing a validation, and sorry, the client is doing validation but before that, when the certificate is handed out, a validation is done of the one that requests the certificate from the certificate authority. The normal way for simple domain validated certificates is that an e?mail is normally sent to the domain name of the domain that the client requested the certificate for (domain). And it's sent an e?mail address that the certificate authority decide and by having an e?mail address within the domain that you requested certificate for that is how you validate that you are in control of the domain, so?and?so that E nail exchange is used to validate that you can ?? that you are the authoritative holder of the domain name, you are in control of it because you are in control of the e?mail server or the addresses of that domain so the CA is actually sending the certificate to you.

There are other kinds of certificates, EV, they do have other out?of?band mechanics that are required but unfortunately it's a little bit difficult for the clients and even more for the end users to understand in what situations EV certificates have been used and when other kind of certificates have been in use, in some browsers when using web clients and web browsers the key ring or something, there is sometimes different indication, agreement, different colours etc., but not many people actually know the difference between the two. And of course, as TLS certificate so large number of protocols, I think if you look at the number of RFCs we are well above 800 different RFCs so this is something that is not only for the web.

So, so fashion everything is fine. But if you now look at something that is called internal server name that we in SSAC call it, they are designed for internal only applications. So for example, if it is the case that you are an enterprise and have a network that is not connected to the world or if it is the case that you for some reason would like to be completely independent from the rest of the world and this is, used in documentation from Microsoft and other vendors similar, to do this way; for example, that you use top level domains like corps and accounting or something else inside your enterprise, this means that people inside your enterprise when accessing things they can go to www.corps or go to mail.corp or similar. Do the thing with these, for the client they look like normal domain names, in TLDs this doesn't exist and for SSL to work the server must present a certificate with that domain name in it so someone that want it use dot corps internally in their organisation they need to have a certificate for .corp. Now .corp is not a TLD that exists so the normal validation mechanism using e?mail cannot be used because the CA cannot send an e?mail to search holder or postmaster because CORP doesn't exist. So, what is really ??

So what is really happening here is that the client, when requesting a certificate, is creating a certificate request and in this certificate request, let's see can I ?? laser pointer ?? there. You see here you have a request for a certificate in the top level domain site that doesn't exist so here is a certificate request for www.site. A validation e?mail cannot be sent so what happens when you submit this to a CA? Well, what you do which is a little bit hard to see, when you go to CA and request the certificate, you get an extra text saying this is a certificate that will not really work on the Internet, are you sure you want to continue? And of course you say yes, of course, this is something I want. So, you check mark here, this certificate will be used on an internal server, so you check mark that and then you do next. So, what happens: Plop, you have got the certificate. So, you have got the certificate which is valid for dub dub dub site and the certificate authority is really nice and gives you a certificate for size as well. And by the way, on these slides you will see there is a name for a certificate authority here but there are ?? we have found at least 157 CAs that hand out these things. So this is nothing special with this CA, OK, just to get that said.

So, now when you have this certificate, what do you is that you set up a fake route, you delegate .site to yourself, you set up a web server serving the cert and this is the part of the presentation that really Warren was do, this is typically him having time and resources and CPU cycles to do this and he thinks it's fun. And then you take your browser and you go to www.site. Of course, so far I think this is fine. You see you have got the certificate, the certificate is fine, blah?blah?blah, just continue same thing in Firefox and chrome and all browsers, this is absolutely fine. So, this is the interesting part: So obviously what you can do is that you can go to CA today, OK, you get a certificate for a well?known name ending in an applied for TLD. Wait. And after sometime, ICANN as we know, might actually do a telligation of that TLD. And some people probably registered domain names in that TLD and note that the certificate that we got back also had a CA, the top level ?? a certificate for just site. Isn't that cool? So, what you then do is you go to star bucks or something and you just wait for ?? you set up your name server, you do cache poisoning, you hijack the TCP or; you present the certificate and have an excellent man in the middle attack when the end user has still got the padlock closed. OK.

At this point in time, when we had a closed meeting with the security and stability advisory committee and Warren had presented this it was complete silence in the room, let me tell you. Because this is something, just like in this room, it's like everyone knows this could happen but no one had really been thinking about it, right? Like, this is not good. So, what do we do? What we decided to do in is to do what we do always when we find these kind of things, we formed a work party and did some searching and some ?? is this ?? was this just Warren doing something like in the morning or is it something that is really bad?

So what we did was that we contacted the EFF and they have an SSL observatory, they gather data and probe all IP addresses on the Internet port 80 and see what certificates they get back; and basically, and that is a whole load of data. If you go through that you will find that this is where we found that at least 157 CAs have issued these kind of certificates. A large number of those certificates for some of the applied for TLDs. We contacted the CA browser forum that has got the policies for the CAs and we noticed they are aware of the issue. They had decided quite some time ago that three years after ICANN assigning a contract with a new DTLD, three years after that contract is signed, they will stop issuing certificates and existing certificates will be rejoked, three years. That is pretty long time in Internet terms.

So, the conclusion of all of these things together from SSAC this conclusion was something we draw in basically in December 2012, we decided that this is so serious so we not only recommend ICANN to actually do something quickly but also that we decided that this is not something that we can make public.

So we contacted ICANN security team and this was something I presented at plenary, Monday which is sort of the important process issue that all of you should know about and the RIPE community should know about, is that what we did was together with the security team of ICANN, we ?? the end result is that disclosure poll receives developed so that ICANN security team can do these kind of things, they contacted browser forum chair and forum, ballot 96 of the CA/B forum passed on February 26th so this was sort of two months after our report was ready. This ballot says 30 days after a contract is signed, no more ?? no more certificates should be issued and 120 days after the contract is signed the certificates should be rejoked for the TLDs that are applied for. And all of this together and all of this action and of course the need that we saw that the community actually to get to know about these issues resulted in us publishing our report SAC 57 on march 15. The report itself, you will see is the report itself, and the story from December until March 15 is in appendix A, so there is a slight difference between the two so those of you who want to know the details can have a look at it.

Did this solve the problem? Well, of course not. Because, first of all, not all CAs are members of CA/B forum, the good thing is most CAs that are doing the right thing they follow the recommendations by the CA browser forum, otherwise as we know if only one CA is not doing their work together regardless of whether they are member of cab forum or not the whole PKI system would fall apart and we know that because of some of the incidents we have had so CAs are in general trustworthy and follow guidelines compared to some other industries because it's still the case that this is just a recommendation on how they should behave.

The other thing is a discussion that is ongoing and of course it opened up again because of this report, a question of whether browsers actually are doing validation of revocation lists and of course, there are lots of different protocols ?? revocation so the question then is, how much problem this actually is.

So, that is a summary. This is where we are at the moment. There are more work going on. And not only regarding certificates but also of course this might have triggered people actually doing other kind of things as well regarding what happens when we have TLDs that have been used in the world quite a lot and then suddenly they are delegated. Will that have any security and stability impact? That is the overall question. But this is is the story about SAC 57, I think we will look at the other issues at future meetings. Thank you.
(Applause)

AUDIENCE SPEAKER: Question. Has this prompted the ICANN new D TLD process to take any actions or modify anything in the process?

PATRIK FALSTROM: Yes.

AUDIENCE SPEAKER: Thank you.

PATRIK FALSTROM: I think the short answer is, the short answer is of course yes, but little bit longer answer is SSAC wrote already in the fall of 2010 a report ?? a generic report saying, this is the TLDs that are not allocated that we see queries for to the root zone. This was in report SAC 45. And ?? 47? OK, I don't remember the numbers, just like the phone numbers, I don't know them off the top of my head; we wrote that in September 2010. That report, together with this about certificates, triggered, among other things, one letter from PayPal and one from Verisign that they required to ICANN that requested to have more serious look at what happens with these allocated TLDs that are queries for to the root zone. This in turn has triggered ICANN staff at the moment to launch some more serious work, to look at the various implications. And I do know that ICANN at the board meeting in Amsterdam this coming weekend will have a discussion on this topic. So, we will see ?? so next week we will know a little bit more what the actual detailed output will be of that but they are absolutely the case that people are taking this much more serious now, yes, but it's not only this report; there are also other things, including the letters from PayPal and Verisign and others so there are probably others that also raise the same or similar issues.

Jaap: OK. Thank you. And this has been all great fun. The next speaker is Matt. And he is going to run his own laptop because he has some animation stuff as far as I understood.

MATT LARSON: Hello everyone, I am from Verisign. This is work that Dwayne Wessels, who works for me in Verisign labs, and I did over the course of two summers.

Actually, let me give you a quick background: Verisign has an internal engineering technical conference every summer that engineers submit paper ideas for and get chosen and present on. And two summers ago, so the summer of June 2011 Dwayne and I proposed a presentation topic called: So who is querying us anyway? The intent was to look at traffic coming to the .com.nameservers from the perspective of source IPs and to try and look at that time as much as we could and slice and dice it in interesting ways. And what we ended up doing was scratching the surface.

The next year in the summer of 2012, we did a presentation: No, really, who is querying us? Was the presentation. And added a few different dimensions and then I realised that we never took this on the road, we never showed anybody, it had only seen an internal audience. So while some of the data is two years old, I think it's interesting enough that I'd like to share it with you.

The quick background is that Verisign runs A?root and J?Root both of which are Anycast but in different ways. We also of course run .com and .net which are also Anycast and Unicast, a combination. There are 17 large sites we call them and then almost 70 small sites. And we have span feeds that we can backcall to a central analysis network, and it wasn't until the presence of our own back phone we built that we were able to do this backcalling and have sufficient bandwidth to do research like this. I have long been frustrated that we get so much traffic but because of the way we've designed things and a VPN to get traffic back from our sites for security reasons it's been difficult to get large volumes of traffic back, but our private backbone which is new within the past, three four years, has changed that.

So we don't backcall everything. For most of this particular research we had four sites coming back. We have data in varying amounts since March 2011 and it's coming from a home?grown LIB P CAP account, which for every query puts the source and destination of the query in a hash table and then increases some counters, basically as you see there we are counting popular query types, DO bit, minute, max TTL, sort of what would be choosing to count and hoping that we could find interesting things as a result.

So we end up with all these data files which then get pulled back and aggregated and distilled and put in a loop cluster and sliced and diced. Some of this presentation is based on one month of data from June 2011 and then some is based on a 15?month sample as you can see from March 2011 to June 2012. However, this being a research effort and well, we have gaps in the data. You know, this particular traffic for research purposes was not the high priority so if there were other production issues that required the traffic not to come, the traffic didn't come and that is just what we have dealt with.

We also take from this same feed that varies between four sites and sometimes more that are back called, one day every month we take a 24?hour snapshot that have and save literally every packet so we have that archived as well and a little bit of this talk is based on that.

So just to give you some idea. This shows there is a different colour for each one of the sites that at one point or another was being back called. We have sort of converged now on four sites coming back to our analysis environment which seems that was that was result of negotiation between us and the network operations folks as to how much they wanted to be dragging back. To give you ideas of order of magnitude, the scale here is billions of queries per day on the left, up to 5 billion. So every billion queries a day, if you do the math and just divide by number of seconds, about 11?and?a?half thousand queries per second so we are talking tens of thousands per second on each of these sites, typically.

So here are just some of the things that I want to talk to you about. Let's start with the number of clients.

If you are looking at the PDF you already know the answer. So in a one month period, four sites, how many unique source IPs did we get? And the answer is 26.4 million. If you plot that and you show the number of cumulative unique clients, in other words the first day was 4 million unique clients and day one and two were five million, just over five million and so on, you get a slope like this, and that just keeps going. I have some more slides like that, just about everybody eventually queries a .com or .net name server.

Here are unique clients per day. So you can see, so this is no reference of one day to another; this is looking at ?? each bar represents looking at that 24?hour period looking at how many unique clients did we get, you can clearly see the weekends and the analysis data on day 4 there. So you know, on the order of just over 4 million unique clients per day. Again, this is at ?? this is ?? I need to stress this four site sample, so it's four IP addresses out of the 13, and standing here I can't recall if it's even the complete of those four addresses or if we have part of Anycast. It's a substantial amount of traffic, it's not all the traffic is my point.

So here is yet another way that we sliced it and diced it to answer some questions, which is: How often do clients query us and keep coming back? So, now, the days represent not days of the month but number of days that clients came back. So, on day one, we saw almost 4 million unique sources. And on day and day two, the set of unique sources went down to 2.3 million and so on, so you eventually get out here almost 800,000 sources queried us every day in that month. Which seems, that seems a reasonable, that seems a reasonable number. We can assume that those are recursive name serve doing what had he do ?? it's no trouble to believe there are 800,000 but I could believe there is busy ones querying this fraction of .com and .net. Here is an attempt to visualise how do these unique sources vary day by day. So you can see the blue represents queryers that were there yesterday but not today, the red represents because they are new today and green is yesterday and today, so the point is that the green is the consistent number and you can see by the magnitude of the blue and the red how much the sources jump around per day in terms of querying us or not queries us. This surprised me a little bit that there were that many sources sort of jumping in and out querying and not querying.

Here is doing that same kind of graph but doing it based on source AS rather than source IP and note that I have shifted time intervals on you. This is almost a year later, another month worth of data and it wasn't until the second presentation that we decided to map every source to its origin AS and look at some things that way. You can see here, not surprisingly, it's much more consistent with, we hear from the same origin source over and over again with very little.

This number is surprising when you look at a month's worth of data how many sources out of those 26 million don't even query us at least once per day; so in other words in the 30 days in June they queried us less than 30 times, a lot, 15.6 million, is just a trickle. So this is not all people running DIG, this has got to be other things and we have a little bit of analysis of that later on in the presentation. But this I think is a surprisingly large number as is some of the RD bit analysis coming up that I will show you. So one more graph here dealing with the number of clients we see:

So this is similar to that graph I showed earlier, but again, the number on the ?? on the X axis doesn't represent a counter day of the month, it represents how many days we saw ?? how many days we saw that number of sources. So in other words, the first in column number one, that is that 15. ?? whatever number I just ?? that is the 15.6 million. That was one day or less. The second column, so it's hard to tell with the scale there, about 4 million addresses queried us exactly two days and so on, and you can see by the time we get to the end, there are very few that queried us exactly 25 days, exactly 26 and so on and then we get back to the same number, those that queried us exactly 30 days, in other words every day of the month.

So I can believe this shape as well, I don't have any trouble agreeing with this data. I am still, as I say, I am surprised that the initial number is so big that there are so many sources that query us so infrequently.

So now let's go back to this cumulative client account, I have jumped ahead in time a year now when we zoomed in on this a little bit to try to get to the bottom of what is the slope on this curve. I am looking at one single site rather than four sites and we are looking at our 15 month window with the unfortunate gaps in it. But here you can see, this is cumulative number of clients over time, so each day that goes on, we see a few more IP addresses and the slope of this curve goes up but it's coloured based on the number of queries per day that the source is sent. So you can see the colours, the top light blue and then the next dark blue, so the top light blue and the top dark blew are even less than within one query per day and then it's not until we get to that yellowish colour that you get to queries that are ten or under per day. What this is showing, by the time we get to the right side of the graph, just look what a massive percentage of the sources are that send ten queries or fewer per day. Which this corresponds to the data we saw earlier that we see a lot of traffic from sources that query us very infrequently.

Now, you can see we have also lost down in the noise, we have lost the busiest sources. So let's look at ?? now I have done it ?? let's zoom in. Now we have thrown away the least talkative sources and we are only plotting the more talkative sources so you can see now we are talking, look at the Y axis we are talking hundreds of thousands of clients sent between that light purplish, pinkish colour, that is ?? 100 to 1,000 queries, but still, let's zoom in one more, although let me first say that this trend does look worrying; the slope is not that great because as the text says there, over time we would expect that these sources would send fewer and fewer queries as averaged out and, therefore, you know, the slope would decrease but let's go one more. So, not only do we have a lot of clients that query us very infrequently; we have a small number of clients that query us very ?? a lot. So here, the blueish colour is one million to ten million, am I counting my zeros right? Yes. One million to ten million queries per day. So you know, in other words, we have at least 100 up to over time, almost 150 queries ?? over 150 sources sending us at least a million queries per day and you can see there are some that send us ten to 100 million queries per day.

This chart just shows that a given site doesn't yet all the IP addresses so now we are plotting the different sites, how many unique clients they are seeing over time and the coloured lines that are all in a group are the individual sites taken independently but when we look at all in our ago grate you see the line is higher, so I think the main take away from this graph is there is affinity forgiven sites that persist over time so we do not eventually, at least not in 15 months it is not the case that eventually everybody queries every site and here is that same data just shown a different way, this time the Y axis is now percentage so you can see none of these lines is at 100 percent, they max out at about 70%, so in other words only ?? the most we ever see is 70% of the sources at any one site.

All right. So I know this is kind of rapid fire but what we found is that the more we ?? more we looked into this, the more we thought number one, we should have looked into this a long time ago and boy, there is a a lot more questions that this raises rather than just giving us answers so we are still mining the depths of this research and asking other questions.

So, the other question I had was: What are people querying us for, not by name but by type, this source IP count AP counts seven different types ?? I am not going to remember which seven up here now ?? and for every query that come in it either records it was one of those seven times, A I forget them all ?? and if it's not one of the seven specific ones it increments the other bucket.

So, yes, I should have advanced the slide and there is the list of types. So, queue does to Dwayne here for figuring out an interesting way to find the clusters. He used K means which is a clustering algorithm and to get it to converge rather than running K means once, he ran multiple iterations, you start with random locations and you keep ?? cluster and keep removing and he kept removing until he got to, this was somewhat arbitrary, he started with 60 ?? 60 clusters and stopped when he got 16 clusters. So here, I hope it was worth it, here is the aforementioned animation, so this is the initial 60 completely random points and this is the distribution of the types. So in other words, like this particular ?? the left most cluster is mostly the blue is A records and the purple is AAAA and then red is MX, so again, these are completely random and what we are going to do now ?? no ?? so now, over time, these ?? the algorithm is running, as it ran it kept moving centres around where it found a cluster and removed one and. And I am doing a rotten job of explaining K means. But the point is over time, he left the algorithm work with fewer and fewer points and he found that repeating this multiple different times with the different random starting points, he was satisfied with this methodology because it eventually converges on this every time. So here are the 16 clusters that we decided to stop the algorithm at. And note that there are percentage along the bottom. Let's look at this first one again, so 28.2% of the sources sent a query prefix that look like this, the blue is a record so mostly A records and then a few MX records is read and a few AAAAs are purple so I find that very easy to believe, that a significant percentage of the sources send you know mostly A record queries. Then look at the next four, varying amounts of address records and MX records. I find that very believable as well that there are a lot of sources that are just looking up address records and MX records. It is comforting eventually you get to some clusters at ?? you get to these clusters here where finally we are seeing some significant amounts, significant fraction of AAAA queries, and then notice the numbers here, though; this is the same graph, we really trail off, I mean literally out of the number, I went quickly by the number he had ?? this is ?? we limited this clustering, somebody had to send at least 100 queries to be included, so this is 7.6 million sources that fed into this analysis and the point is this cluster out of 7.6 million is only 18. So look at the percentage, we really, really trail off here. Most of the source are sending, what I would think would be expected profiles of queries. So this sort of validated our assumption that we are probably seeing a lot of ?? not just an assumption but anecdotal of A and MX queries.

I have just a few slides on IPv6 deployment from the perspective of A V AAAA. So you can measure v6 deployment from a DNS perspective in at least two ways: One way how many source IP sending one AAAA versus at least one address query and that is at least an indication of some amount of v6 activity because someone is look up AAAA queries to use them some way. I think the main take away here, now this is sources so we have two different data sets, two different months a year apart, you can see we are at least moving in the right direction from IPv6 deployment perspective, from 16 to 18 percent send at least one quad at query. And the other way to look at this, is let's look at total AAAA versus A, what is the ratios of total queries, here you can see it's not quite moving in the positive direction, this could just be an artifact of different data sets. This is not really an apples to apples comparison because not only of the time period difference but the difference in the sites that we can considered but it's better than no data. So you can see the take away from here is right around 12% of the queries we get are for AAAA.

And here is a graph that somewhat confusingly plots both of those values. So let's start with sources. So look at the red and then look at the red and then the orange. So the red is the number ?? percentage, the percentage of sources that sent at least one address query, so not surprisingly, 990 percent of the ?? send at least one address queer rear per day in this 15 month time period and 30% roughly send us at least AAAA. And because it's at least one A and at least one AAAA those aren't going to add to 100 percent because there are some sources that send one of each at least. Then let's look at queries themselves, so this line here is the percentage of total queries per day that were address queries so about 70% of queries we receive are address queries and then this line here is the percent we receive that are AAAAs, that is the 12 percent line showing up there and this line and this line don't add to 100 percent because we get queries for other types that aren't A or AAAA. But I think this represents, that is not just a small blip, that is 12% AAAA queries and 30% of the sources are asking for AAAA, that strikes me as a substantial fraction.

Now, what about top talkers, as I showed earlier in the presentation, we have some sources that query is a lot. This curve is, I don't think, particularly surprising. The 100 busiest clients ?? that is CDF, this is the rank ?? this represents what fraction of the total did the busiest client send us versus what fraction of the total did the, say, the thousand busiest client and so on. So this is showing that the 100 busiest clients generate 10% of the queries so a lot of really chatty top talkers at ?? at 5,000 clients you get half the queries or half the queries coming from 5,000 and if you want 90 percent you only need 200,000 clients. So, this again, shows there is this really long tail of sources that send very few queries.

So this is back to the, this is over ?? this is over the 15 month sample, this particular graph, I am sorry it's not labelled particularly well. This is taking every source in that time period, mapping it to origin AS and then looking at what percentage of the traffic came that origin AS. And the colours represent our guesses ahead of time, strictly based on nothing but our gut, which origin ASs do we think will be sending us a lot of queries. We guess pretty well the top ones are all coloured in, we suspected Comcast, huge ISP in the US going to send us a lot of queries. So this represents the percent of all queries we received coming from these ASs.

Here is the very top ones, plotted this way over time. So you can see that these particular origin ASs send us in some cases, this is Microsoft, we get a substantial percentage of queries from Microsoft, at least in this time period.

I know I am barrelling along but I don't have a lot of time.

The last thing I want to talk about is queries we get with the recursion desired. In theory we should not get any of these, the stub resolvers ?? recursive name /KWAOERS Reese query authoritative ones, that has the RD bit clear. So if everyone were doing sort of exactly as we would expect we'd see 0 recursion desired queries. However, nearly 7% of all queries we receive have RD set. And when we look at clients, 42% of the clients of the sources we see send only recursion desired queries. Right, three out of five queeriers to the come net name servers set RD equal to one all the time. So I think they are demonstrably not name servers or certainly not one of the most popular implementations which doesn't set RD. And what is this stuff? This graph, the main take away from this graph is that there are very chatty ?? let me back up ?? there is a very similar distribution of how many clients send what volume of queries, whether it's recursion desired or not. That was not very clear.

The point is, look down here, we have just a few clients sending a lot of recursive and a lot of non?recursive, the point is the slope of these lines is similar, it's not like we are in a situation where the profile of queriers that send all recursive versus all non?recursive is substantially different.

So, what ?? who is sending us these, so Duane did more really good work here I think diving in to try to gig it out. These are the different parameters that he looked at got a description here of the ?? some of his findings. I think this chart is not as interesting as the ones coming up so this is calculating bits of entropy and he looked at named similarity. But let me get straight to the examples. So this was a very common type of RD equals one querier, so you can see these are ?? this is all from the single source, poor source port randomness, poor ID randomness. This is 16?bit value and we are seeing only a tiny portion of it being used. All MX queries.

So, the vast majority, three?quarters of the RD equals one sources are doing this, are doing just MX look?ups. So, probably spam sending malware of some kind, think a pretty good guess, Warren and others are shaking heads vigorously. So not surprising but nice to validate it.

And then we see various other things here, when we look at ?? when we reverse map this name and get something with an MX in it and see it's mostly MX queries but we see references to RBLs, probably custom written because the thing is sending RD equals one and whoever wrote it didn't know not to do that. There are things that look up C names only with the RD bit set. And then there are things that look they might be trawling for registrations in, these are NS queries.

So, I have a few final observations, all of which I made earlier in the talk, when we look at the number of clients we see a significant fraction of the entire IPv4 space seems like eventually almost everybody queries come net. When we look at the query type distribution it follows the intuitively expected distribution, a lot of MX and reassuring number that also ask a significant fraction of AAAA. You know when we look at AAAA we see a significant percentage of our queriers and our queries, looking for and looking up AAAA.

The big resolvers that I showed you on that graph take a significant ?? use a significant percentage of ?? I should say, look up a significant percentage of all queries, you know 3 to 5% of the total query load.

And then finally we receive a lot of recursion desired queries but a significant portion of them look to be almost certainly attributable to spam sending malware of some kind connecting the dots.

So with that, and apologies for going so quickly, it looks like we have one minute for questions and other than that I am keeping you from cookies.

ROY: Thank you for the presentation, it was fascinating. Just a question, have you also looked at queries for zones you are not responsible for?

MATT LARSON: Only anecdotally. We do see some, not as part of this study or anything seriously or recently enough that I remember what they were but we do see them.

JIM REID: Just another guy. Fascinating stuff. I think there is a lot more data to come out that have and much more analysis but it's /POFPBLT just curious: You have only concentrated on queries, have you looked for other up codes that, like dynamic updates and things like that

MATT LARSON: We do get those as well. And I don't ?? I don't know the percentages off the top of my head. But, a significant amount just because the whole queries volume is circumstances but but not something we certainly looked at for this or that I have looked at recently.

JIM REID: It's just an observation, I reyour predecessor once saying me to me he was able to track the deployment of Windows 2000 by the number of dynamics updates hitting A.

MATT LARSON: That was the case.

WARREN KUMARI: Fascinating data.

MATT LARSON: I didn't go slowly enough for you to ??

WARREN KUMARI: Of the sources that you only saw queries once a Kay did you look and see how many of those are routed on the Internet and how many are randomly made up sources?

MATT LARSON: No, but the nice thing is people ask good questions and we need to remember them and look into them so that is good one.

AUDIENCE SPEAKER: I have been looking into this as well, especially using the K means and the ?? the classic algorithms for DNS traffic so one idea for you is you can go and try to track ?? pass civil track the validated name servers based on the patent traffic you see so you organise the database on the names and DS records, DNSKEYs and AAAA records. The second idea is you can go and try to mix the query type distribution plus the RD bit from the clients in order to identify like valid cache resolvers against the other types you were exploring like C names explore and MX ??

MATT LARSON: That is a good idea to filter on our idea to add to the clustering. We did have that 100 query cut?off, but you are right for the things that asked us a lot of MX records they were still in that clustering.

AUDIENCE SPEAKER: Sam /WAO*EULer: Likewise for the transport ??

MATT LARSON: Yeah I see what you mean, yeah.

AUDIENCE SPEAKER: We have an open recursive service but we try to do our best and so inherently we look at the data and recently we have seen and I don't know why there is a massive BotNet that does an MX BotNet lookup for every single e?mail it sends regardless of whether it's trying to use the same record over and over so I don't know if it's just poorly written or if it's trying to make sure it always has the most valid data but all of a sudden we saw massive influx of MX look?ups.

Jaap: Well, thank you.

MATT LARSON: Thank you.
(Applause)

Jaap: It looks like we are not being traditional any more and just head over for a minute. Enjoy your coffee.

Connect with RIPE 66:

Local Host

Sponsors

become a sponsor