As promised in the last post, I have compared the performances of several graph libraries, in order to choose which ones should be deployed with Sagemath. Here, I provide the main results of this analysis, while more details are available on my website (see also the links below).
The libraries chosen are the most famous graph libraries written in Python, C, or C++ (I have chosen these languages because they are easier to integrate in Sagemath, using Cython). Furthermore, I have excluded NetworkX, which is already deployed with Sagemath.
First of all, I have to enforce that no graph library comparison can be completely fair, and also this comparison can be criticized, due to the large amount of available routines, to the constant evolution of libraries, and to many small differences in the outputs (for instance, one library might compute the value of a maximum s-t flow, another library might actually compute the flow, and a third one might compute all maximum flows). Despite this, I have tried to be as fair as possible, through a deeper and more detailed analysis than previous comparisons (https://graph-tool.skewed.de/performance,
http://www.programmershare.com/3210372/,
http://arxiv.org/pdf/1403.3005.pdf).
The first comparison
deals with the number of algorithms implemented. I have chosen a set
of 107 possible algorithms, trying to cover all possible tasks that a
graph library should perform (avoiding easy tasks that are common to all libraries, like
outputting the number of nodes, the number of edges, the neighbors of
a node, etc). In some cases, two tasks were collapsed in one, if
the algorithms solving these tasks are very similar (for instance, computing a maximum flow
and computing a minimum cut, computing vertex betweenness and edge
betweenness, etc).
The number of
routines available for each library is plotted in the following chart, and a table containing all features is available in HTML or as a Google Sheet.
The results show that Sagemath
has more routines than all competitors (66), closely followed by
igraph (62). All other libraries are very close to each other, having
about 30 routines each. Furthermore, Sagemath could be
improved in the fields of neighbor similarity measures
(assortativity, bibcoupling, cocitation, etc), community detection,
and random graph generators. For instance, igraph contains 29 routines that are not available in Sagemath.
The second
comparison analyzes the running-time of some of the algorithms
implemented in the libraries. In particular, I have chosen 8 of the
most common tasks in graph analysis: computing the diameter, computing the
maximum flow between two vertices, finding connected components and
strongly connected components, computing betweenness centrality, computing the
clustering coefficient, computing the clique number, and generating a graph
with the preferential attachment model. I have run each of these
algorithms on 3 inputs, and I have considered the total execution
time (excluding the time needed to load the graph). More details on this experiment are available here, and the results are also available in a Google Sheet.
In order to make the
results more readable, I have plotted the ratio between the time
needed by a given library and the minimum time needed by any library.
If an algorithm was not implemented, or it needed more than 3 hours
to complete, the corresponding bar is not shown.
Overall, the results show that NetworKit is the fastest library, or one of the fastest, in all routines that are implemented (apart from the generation of preferential attachment graphs, where it is very slow). Boost graph library is very close to NetworKit, and it also contains more routines. Also Sagemath is quite efficient in all tasks, apart from the computation of strongly connected components and the generation of a preferential attachment graph, where it needed more than 3 hours. However, in the latter case, the main problem was not speed but memory consumption.
In conclusion, Sagemath can highly benefit from the possibility of using algorithms from other libraries. First of all, it might improve the number of algorithms offered, especially by including igraph, and it also might improve its performance, by including Boost, NetworKit, or other fast graph libraries.
No comments:
Post a Comment