Patrick McGuire: Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task. (arXiv:1608.02717v1 [cs.CV])

Tuesday, August 9, 2016

Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task. (arXiv:1608.02717v1 [cs.CV])

We present Mean Box Pooling, a novel visual representation that pools over CNN representations of a large number, highly overlapping object proposals. We show that such representation together with nCCA, a successful multimodal embedding technique, achieves state-of-the-art performance on the Visual Madlibs task. Moreover, inspired by the nCCA's objective function, we extend classical CNN+LSTM approach to train the network by directly maximizing the similarity between the internal representation of the deep learning architecture and candidate answers. Again, such approach achieves a significant improvement over the prior work that also uses CNN+LSTM approach on Visual Madlibs.

from cs.AI updates on arXiv.org http://ift.tt/2bdICG0
via IFTTT

Patrick McGuire

Latest YouTube Video

Tuesday, August 9, 2016

Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task. (arXiv:1608.02717v1 [cs.CV])

No comments:

Click to Show Support

Click to Show Support