By Jincheng Wang & Zhenghan Zhang


Introduction

Pointer network is a neural network model designed to solve sequence learning problems where the output sequence dictionary 1-to-1-corresponds to elements of the input sequence. The figure on the right shows an example of pointer network.

Such problems cannot be addressed by existing approaches such as sequence-to-sequence because the number of target classes in each step of the output depends on the length of the input, which is variable.






Euclidean minimum spanning tree is a special case of minimum spanning tree where the weight of an edge is the Euclidean distance between its two end points. the figure on the right shows an example of Euclidean minimum spanning tree in R2.

The original paper presents three geometry-related problems as experiments, including Convex Hull, Delaunay Triangulation and planar Traveling Salesman Problem. In this project, we would like to apply pointer network to the Euclidean minimum spanning tree problem. We first implemented Pointer Network using Keras. Then we apply the model to our generated input and output data for Euclidean Minimum Spanning Tree problem.


Model Implementation

We implemented our model using Keras. Unlike the original model, our implementation is simplified in several ways:

  • a) instead of stopping after generating a special stop symbol, our encoder must be informed of the length of the output sequence
  • b) our model, after being trained on input sequences of a particular length, cannot be used on input sequences of different lengths.

The LSTMs we used have hidden states of size 128. we used categorical crossentropy as our objective function. After experimenting with several optimizers such as stochastic gradient descent, RMSprop, Adadelta, Adam and Nadam, we chose Nadam (Nesterov Adam) as our optimizer, on the basis that it had better performance on a small training set with 5 samples, and will hopefully do well on larger training sets.


Data Generation

Our input data consists of a list of points in Euclidean space. This is an extension to the planar points used in the paper.

We generate input points by sampling from a distribution [0, 1] in the Euclidean space. For each set of input data in the training and test data sets, We generate output data by applying Prim’s algorithm, which would output a list of edges that constitutes the minimum spanning tree. Each edge can be represented as a pair of (a, b) in which both a and b are indices of vertices in the input sequence. For the ease of training, we sort the edges such that each a is smaller than or equal to b, and all edges are sorted based on lexicographical order.

By creating a large amount of input lists, we are able to accumulate a large amount of training data that fits the model requirement.


Experiments

After training the model and generating the set of test data, we intend to test the accuracy of the model by comparing the total Euclidean distance of our predicted minimum spanning tree with that of the actual minimum spanning tree generated from Prim’s algorithm. However, since we spent most of the time trying to figure out how to correctly implement the model, we didn't have sufficient time to train the model accurate enough to use this metrics to examine the accuracy. We still rely on the accuracy of the training set and the loss curve.

We generated 10000 samples as the training set and 2000 samples as the test set. Each sample has a sequence of 10 points as its input. After training for 1000 epochs, on the training set our accuracy is 0.7092 and our loss is 0.7704. On the test set, our accuracy is 0.3314 and our loss is 3.6589.

From figures above we can see that even though our model seems to have not yet converged, it could take significantly more time to obtain notably better results.

Figure 3. The plot shows the training accuracy vs. epoch. After 1000 epochs, the training accuracy reaches 70%.

Figure 4. The plot shows the loss vs. epoch. After 1000 epochs, the loss goes down to about 0.8.


Discussion

We noticed that our results seem much worse than those from the original paper. In the original paper, the authors used training sets of size one million and the model normally converges between 10~20 epochs. We compensated for the reduction in the size of the training set by increasing the number of epochs, and since our training set is smaller, it should be easier for the model to reproduce the training set. We are not sure if this is due to a mistake in our implementation, differences in how the model is trained, certain characteristics of the Euclidean MST problem or something else.

We encountered a number of difficulties while working on the project. One of them is that Keras might not be well suited to the implementation of the Pointer Network model. In Keras, model are composed of layers, whose shapes must be known when the model is compiled, but our model has variable input sequence length and output sequence dictionary size. We can partly solve this problem through the use of padding on the input sequence. However, the model could still predict non-zero probablities for the padded input "points", which is undesirable. Padding is also in some sense inelegant in that one of the interesting features of the pointer network model is that it can work on sequences of arbitrary length, since the weight matrices have identical shapes for different input sequence length, and can thus be reused.

One other thing we noticed is that APIs of neural network packages change rapidly. Much of our time were spent pondering other people's implementation of pointer networks or similar models, often written just a few months ago, only to realize that their code were written using old, obsolete APIs of some dependent package. And when we tried to run these codes after installing older versions of the dependent packages, we would often discover that these packages also depends on older versions of other packages. It was a somewhat frustrating experience, but we were also amazed by how fast the field of neural networks is changing.


Repository

Our code coule be found here.


Future works

One direction that might be worth exploring is why our results seem worse than those from the original paper.

We have also briefly looked at neural network platforms that might make implementation of pointer networks easier. In particular, we noticed that Dynet, "a toolkit for implementing neural network models based on dynamic declaration of network structure", might make the implementation of pointer networks easier.

Also, even though pointer network is a very cool model, its range of applications is relatively limited, and neither did the original paper nor did our project provide a real world application of this model. It could be worth looking into its real world applications in the future.


References

  • Vinyals, Oriol et al. “Pointer Networks.” CoRR abs/1506.03134 (2015): n. pag.
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112, 2014.
  • https://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree