Monday, October 27, 2014

Session 2: How to manage networks. Panel discussion. (Akella, Donovan, Wu)

Q: (Jeff Mogul) There exist a couple of existing works about natural language processing. You should look at this.
A: (Aditya Akella) There is indeed a lot of related work that we missed.

Q: (Nina Taft) What are your ideas to solve the scalability challenges in your respective problems?
A: (Aditya Akella)

  • Management-type activities are typically slow-paced. In most cases, there is no need for a fast answer.
  • The need for scale presupposes the availability of a lot of data in the first place. For that, we are actually in favor of an anonymized repository where people can post their management data.

A: (Sean Donovan) A lot of the optimizations that we are considering are specific to one area. More particularly, we are thinking of leveraging overlapping rules and pruning the rules that are unused on-the-fly. We could also aggregate rules that are contiguous, again, on-the-fly. Finally, we are thinking of leveraging new SDN features like switches with multiple tables.
     
Q: (Vyas Sekar) Trouble tickets are usually unstructured data. Can your work be used to inform the design of structured trouble ticket systems to enable better management plane analytics? Apparently, forensic professing in crime analysis have these kind of structure in their "tickets" system.

A: (Aditya Akella):

  • Tickets are structured and unstructured. For some of the outages, structured data are 3 pages long. Regarding what data to provide, if we see some common root cause events, then we can come up with a structure that capture those root causes systematically. In general though, some kind of unstructured data will always be there. And dealing with big data will still be necessary.
  • Another possibility is to relate the trouble ticket back to the control-plane actions that have lead to this trouble ticket being generated in the first place. These would constitute more structured data that could be of use.

Q: (Brad Karp) If you build a system that is shifting tasks around as yours, it seems that Aditya's results show you that you'll have more tickets. How can you provide hard guarantees to make sure that you are not hit by adverse events?

A: (Winfei Wu) Providing 100% guarantee is indeed difficult. We actually profile resources utilization e.g. every 100ms and, based on these 100ms profiles, they can then adapt the load to provide a given number of 9s.

Q: (Theo Benson, to Aditya) What kind of information should Sean provide you in order for your work to be better of?

A: (Aditya Akella) The kind of changes that are happening to the networks, what are the attributes of these changes, and what is the impact on the network.

Q (Brighten Godfrey) How do you integrate diverse sources of data?

A: (Aditya Akella) Normalizing data across different networks is indeed a challenge. One approach would be to normalize them according to some impact metric. For instance, the outage time. Within one network though, trouble tickets are fairly consistent according to Aditya. It is possible though that other data sets might be brought and bring value. I didn't look at this yet. And, for that to be useful, we would need to come up with a way to map these external data to "change events".

Q: What/Where is the boundary between the management- and the control-plane?

A: (Aditya Akella) The management plane is where humans interact with the network.

A: (Sean Donovan) The control-plane is where the system takes over.

Q: (Anja Feldmann) According to your talks, isn't it that the best network is a network that is perfectly homogenous and never changes? (laughs). In contrast, the research community is going towards more change, more flexibility. What tools or hooks do you think we need in order to make those network changes more easily manageable?

A: (Aditya Akella) We need ways to related the data that we generate to the corresponding root cause, then we would be in a better position to understand the impact of network changes.

Q: Sean has the challenge of going from general to specific. Aditya has the opposite problems. Can one help the other?

A: (Aditya Akella) We are using natural language processing on ticket to understand the root cause and the impact of a ticket. Our experience with that tool is that it required a lot of manual inputs to make useful predictions. There seems to be a rich avenue for tools that take those raw data and make more actionable inputs out of them.

A: (Sean Donovan) We haven't explored that connection yet, but that's definitely interesting.