Internet pervasiveness in Africa has been slowly but steadily increasing since the beginning of this millennium. Thanks to several organisations which donated time and resources, it is nowadays possible to claim that the AS ecosystems of several countries in Africa are now experiencing an early stage of the peering era. But how much of this newborn peering connectivity are we able to reveal using the BGP route collectors publicly available? By analysing BGP data available with existing techniques we found that a lot of this connectivity is missing from the dataset, mainly due to the lack of data sources in the region. In most countries, this could theoretically be solved by introducing no more than ten new ASes sharing their full routing information to route collectors.
The Internet AS-level ecosystem
The Internet is composed of as a set of heterogeneous and independent networks, each of which compete and cooperate with each other by means of the Border Gateway Protocol (BGP) to build the routes that will carry the actual traffic. This ecosystem can be analysed at different levels of abstraction depending on the type of analysis that has to be carried on, e.g. Autonomous Systems (ASes), IP/router, Points of Presence (PoPs). The AS-level in particular is helpful to analyse how the different players composing the Internet (i.e. ASes) interact with each other – in terms of BGP routing – without focusing too much on details concerning the inner structure of each player. The complete knowledge of the AS-level of a given region would also help network administrators in that very same region to plan their inter-domain routing in advance, therefore introducing the proper amount of redundancies in their provider choices so that possible problems of the regional Internet would have the minimum impact on the performances of their own networks.
The Internet AS-level is typically represented as a graph where nodes are ASes and connections are BGP sessions established between ASes. Two ASes that decide to establish a BGP session essentially exchange a set of network reachability information which is used to route part of their Internet traffic between them. The amount and the nature of the reachability information exchanged totally depend on the economic agreements signed between the two ASes, which can typically be classified as provider-to-customer (p2c) or peer-to-peer (p2p). In the former, the provider announces to the customer the routes to reach all the Internet networks, whereas the customer announces to the provider the routes to reach its networks and its client networks (if any). In the latter, the two ASes exchange routes to reach their respective clients, typically to keep local traffic local and to reduce transit costs. The primary data source to analyse the Internet at the AS-level of abstraction is the BGP data collected and provided by organisations running route collectors, such as the Réseaux IP Européens Network Coordination Centre (RIPE NCC) with the Routing Information Service (RIS), the University of Oregon with the Route Views project and the Institute of Informatics and Telematics of the Italian National Research Council (IIT-CNR) with the Isolario project. A route collector is a server running a BGP routing daemon which collects and stores routing information in Multi-Threaded Routing Toolkit (MRT) format, and does not announce any reachability information back to its BGP neighbors. Every AS is free to join and share its routing information with the public, contributing to improving the amount – and thus the quality – of BGP data available for research purposes. The fundamental piece of information found in collected BGP data to analyse the AS-level ecosystem is the AS PATH attribute, which can be used to infer both the nodes (ASes) and the connections (AS adjacencies) of the AS-level topology.
BGP route collector
Geolocation
Thanks to databases publicly available, it is possible to infer the AS geolocation by geolocating each of the subnets announced by the AS thanks to the availability of databases mapping IP addresses in countries. This technique allows to infer regional topologies just considering that each AS adjacency can be geolocated if both ASes are announcing (at least) one subnet in the very same country. In the end, the set of ASes geolocated in a given country/continent will contain both ASes owned by local organisations and strictly linked to the territory where they operate – hereafter local ASes – and ASes owned by international organisations that are operating in the very same territory for marketing purposes – hereafter international ASes. Hereafter we will consider a given AS to be local if there is an entry in the AFRINIC registry related to that AS.
A First Glance at the African AS-level Ecosystem
Africa is an extremely heterogeneous continent in terms of language, culture, and economics, and this heterogeneity can also be recognized also in its AS-level ecosystem. In the very same continent coexist countries with good Internet connectivity and penetration – with another set of countries where Internet infrastructure still needs to be a consolidated part of their economies. Out of 1084 local ASes, South Africa receives the lion’s share with 322 ASes, followed by Nigeria (145 ASes), Kenya (79 ASes), Tanzania (63 ASes) and Ghana (56 ASes). One of the most impressive feature that can be noticed at glance is the poor pervasiveness of IPv6 despite the efforts spent by several organisations in training sessions and IPv6 focused conferences. Every local AS announces on the Internet at least one IPv4 network while only 203 of them announce (at least) one IPv6 network. The latter set of ASes is mainly distributed among South Africa (47%), Tanzania (13%), Kenya (10%), Mauritius (9%) and Nigeria (6%). Another interesting aspect is that just 90 local ASes (about 8%) are located in more than a single country, highlighting how traffic transiting between neighboring countries is still dependant on international providers.
Similarly to the rest of the Internet ecosystem, the peering ecosystem in Africa is at a very early stage of development. Not many years ago most of local traffic was routed via Europe and North America, causing issues in performance due to high latencies. Things started changing during last decade, when initiatives like the African Internet Exchange System (AXIS) project led to a dramatic increase in the number of IXPs in the region. Nowadays in Africa, there can be found 37 active IXPs located in 34 cities in 28 countries (source: The African IXP Association). Scraping the websites of each IXP, it is easy to see that most of them have currently less than 20 ASes connected, with the notable exception of NAPAfrica in South Africa (273 ASes among Johannesburg, Cape Town and Durban), JINX in South Africa (82 ASes in Johannesburg), IXPN in Nigeria (54 ASes amon Lagos, Abuja and Port Harcourt), KIXP in Kenya (36 ASes between Nairobi and Mombasa), TIX in Tanzania (36 ASes between Dar es Salaam and Arusha) and UIXP in Uganda (26 ASes in Kampala). The presence of an IXP as crowded as NAPAfrica in South Africa stress even more how South Africa’s Internet ecosystem is totally different from the rest of Africa, resembling the ecosystem of a European country.
African AS-level ecosystem data
On the Completeness of African AS-level Graph
BGP data is known to be far from being completely representative of the Internet AS-level ecosystem. First, the number of ASes participating in any route collecting project is extremely low if compared to the whole size of ASes composing the Internet. During our analysis, only 525 ASes were sharing their routing information with Isolario, RIS and/or Route Views, while the total number of ASes routed on the Internet was 59,005. Second, route collectors are not receiving complete routing information from all of their feeders. Several collectors are placed on IXPs across the world and many feeders apply to them the very same export policies applied to other IXP participants. In other words, they announce to the route collectors only their customer cone, which provides an extremely limited view of the Internet. During our analysis, about half of the feeders were showing this kind of behavior, with only 257 ASes sharing an IPv4 space and 200 ASes sharing an IPv6 space close to a full routing table. The feeders announcing their full routing table to route collectors will be hereafter referred to as full feeders. Finally, BGP data is known to miss a large part of p2p connectivity established at IXPs or via private peering. This is mostly caused by the location of full feeders in the AS graph and the presence of BGP export policies and economic relationships between ASes. Given the standard economic relationships established between ASes, an AS announces to the other AS its full routing information – containing routes learned from its peers, providers and customers – but only if it is a provider of the other party. As a consequence, a route collector is able to see routes established via IXPs and private peering of a given AS X only if exists a chain of transit relationships from the route collector towards X. This concept has been formalised as p2c-distance, and it has been used to quantify the number of ASes for which it is possible to discover the full connectivity given a set of full feeders. The resulting graph incompleteness must be taken into serious consideration when analysing the Internet at the AS-level of abstraction since it can easily lead to wrong conclusions, especially when analysing the graph properties.
A route collector R connected to the top of the Internet hierarchy won’t be able to reveal the peering connectivity established in the lower part of the hierarchy
The coverage situation in Africa is not very different from the rest of the world. Currently there are three route collectors in Africa physically deployed at KIXP in Kenya (Route Views), JINX in South Africa (Route Views) and NAPAfrica in South Africa (RIS). Those collectors receive data from 69 feeders, 63 located in South Africa, 4 in Kenya and 2 in Mauritius. An additional feeder from South Africa is connected to Isolario via multihop BGP. The vast majority of feeders announce only a small portion of IP space, much smaller than the respective full routing table space which nowadays is composed of around 600k (v4) and 40k (v6) routes. Out of 69 feeders, only 13 can be considered v4 full feeders and only 9 v6 full feeders. All of them are located in South Africa, with the exception of one v4 and one v6 full feeders located in the island of Mauritius. Thus, it is almost straightforward to understand that the peering connectivity established at the 30 IXPs in Africa located neither in South Africa nor in Mauritius is currently completely hidden to BGP route collectors, while the small number of full feeders available in South Africa and Mauritius do not allow them to reveal much of the peering connectivity in their countries. Taking into account the p2c-distance metric, it is possible to claim that the current full feeders allow to reveal the full connectivity of 29 transit IPv4 ASes out of 129 (22.5%) and 5 out of 28 transit IPv6 ASes (17.9%) in South Africa whereas it is possible to discover the full connectivity of 6 transit IPv4 ASes out of 31 (19.4%) and no IPv6 transit ASes over 9 in Mauritius. In the rest of the African countries, the only ASes covered are the international ASes, which are covered from feeders outside of Africa.
To better understand how far the current BGP measurement system is from the ideal condition, where the entire p2p connectivity of each country is revealable and potentially visible, we applied the Minimum Set Cover (MSC) problem described by Gregori et al. to each regional topology gathered from BGP data. In each regional scenario every AS available is considered to be a potential feeder with its own covering set – i.e. the set of transit ASes having a finite p2c-distance from the AS – and the goal of the problem is to find the minimum number of ASes whose covering sets cover the whole set of transit ASes in that region. Figure here below shows the Complementary Cumulative Distribution Function (CCDF) of the number of feeders required in each African country. Note that the v6 scenario is computed based only on 12 countries where ASes were connected to each other. The scenario of South Africa is the most distinguishable in both pictures. Given the large amount of ASes in both v4 and v6 scenarios, a rather large number of feeders is required to obtain the full coverage of transit ASes. In all the other cases though, the number of feeders required is quite low, often smaller than 10 either in v4 or v6. This means that with a considerably small effort – 10 full BGP sessions to be established – it could be possible to reveal the full peering connectivity of 90% of countries in Africa.
CCDF of the solution cardinality of MSC problem in each African country
Conclusion
Africa shows in its AS-level ecosystem the same heterogeneity it shows terms of culture, economics and development. The most developed AS-level ecosystem can be found in South Africa, where the peering ecosystem is extremely similar to most European countries, as proved by the number of IXP available. Then, there is a small set of progressive countries (e.g. Egypt, Kenya, Nigeria and Tanzania) where Internet pervasiveness is steadily increasing and being more and more an important part of their economy. Finally, there is a large set of countries where Internet is at the very early stage of development. In this ecosystem, we found that BGP route collectors almost completely fail to reveal the peering connectivity established among ASes, thus affecting any possible graph analysis concerning the African ecosystem. Despite that, we found that theoretically it could be possible to solve this situation in most African countries by introducing just ten new full feeders.
Ending footnotes
The analyses performed in this blog entry are computed on BGP data collected by every route collector made available by Isolario, RIS and Route Views on August 8th, 2017. Geolocation is performed using the GeoLite2 Country database made publicly available by Maxmind, while economic relationships are inferred using the algorithm described by Gregori et al.
Bio: Alessandro Improta received his BSc and MSc in Computer Engineering from the University of Pisa, Italy, in 2006 and 2009, respectively. He then went on to receive his PhD in Information Engineering from the University of Pisa, in 2013. Since 2009 Alessandro has held a research position with the Institute of Informatics and Telematics (IIT) at the Italian National Research Council (CNR) in Pisa. His research interests include Internet AS-level measurement and analysis, and the discovery of Internet path characteristics.