Venues | Schapiro CEPSR Room 750 | International Affairs- Altschul Auditorium | International Affairs- Altschul Auditorium | International Affairs- Altschul Auditorium | ||
Schapiro CEPSR Room 750 | Schapiro CEPSR Room 415 | Mudd Carleton Lounge | International Affairs- Altschul Auditorium | |||
Date | June 6 Monday | June 7 Tuesday | June 8 Wednesday | June 9 Thursday | ||
Tutorial & Workshop | Main Conference | Main Conference | Industry Day | |||
08:20am | Registration | Registration | Registration | Registration | ||
08:40am | Welcome and Introduction | |||||
09:00am |
1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction (MARMI) p1 | |||||
09:20am | ||||||
09:40am | ||||||
10:00am | Coffee Break | Coffee Break | ||||
10:20am | Coffee Break | |||||
10:40am | Coffee Break | |||||
11:00am |
1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction (MARMI) p2 | |||||
11:20am | ||||||
11:40am | ||||||
12:00pm | Lunch (Mudd Carleton Lounge) | Lunch (Mudd Carleton Lounge) | Lunch (Mudd Carleton Lounge) | |||
12:20pm | ||||||
12:40pm | ||||||
01:00pm | Lunch (Schapiro CEPSR Ground Floor) | |||||
01:20pm | ||||||
01:40pm | ||||||
02:00pm | Tutorial 1 p1 | Tutorial 2 p1 | Conference Close | |||
02:20pm | ||||||
02:40pm | Coffee Break | Coffee Break | ACM Multimedia 2016 TPC Workshop | |||
03:00pm | ||||||
03:20pm | ||||||
03:40pm | Coffee Break | |||||
04:00pm | Tutorial 1 p2 | Tutorial 2 p2 | and | |||
04:20pm | ||||||
04:40pm | Coffee Break | |||||
05:00pm | ||||||
05:20pm | ||||||
05:40pm | ||||||
06:00pm | Welcome Reception Avery Plaza (click for map) | |||||
06:20pm | ||||||
06:40pm | ||||||
7:00pm --late | Social Dinner @ Calle Ocho (click for directions) |
45 | Matching User Photos to Online Products with Robust Deep Features |
Xi Wang, Zhenfeng Sun, Wenqiang Zhang and Yu-Gang Jiang (Fudan University) | |
___- | |
72 | Video Emotion Recognition with Transferred Deep Feature Encodings |
Baohan Xu*, Yanwei Fu', Yu-Gang Jiang*, Boyang Li' and Leonid Sigal' (*Fudan university, 'Disney Research Pittsburgh) | |
___- | |
137 | Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features |
Lorenzo Baraldi, Costantino Grana and Rita Cucchiara (University of Modena and Reggio Emilia) | |
___- | |
151 | ACD: Action Concept Discovery from Image-Sentence Corpora |
Jiyang Gao, Chen Sun and Ram Nevatia (University of Southern California) | |
___- |
8 | GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring |
Wenjing Ma*, Liangliang Cao' and Lei Yu* (*Institute of Software Chinese Academy of Sciences, 'Yahoo! Labs) | |
___- | |
61 | Mouse Activity as an Indicator of Interestingness in Video |
Gloria Zen*, Yale Song', Paloma de Juan' and Alejandro Jaimes' (University of Trento*, ^Yahoo! Labs, 'AiCure) | |
___- | |
132 | Automatic Identification of Sports Video Highlights using Viewer Interest Features |
Prithwi Raj Chakraborty', Ligang Zhang*, Dian Tjondronegoro' and Vinod Chandran' ('Queensland University of Technology, *Xi'an University of Technology) | |
___- | |
148 | Diverse Concept-Level Features for Multi-Object Classification |
Youssef Tamaazousti*, Hervé Le Borgne* and Céline Hudelot' (*CEA List, 'Centrale-Supélec) | |
___- |
47 | Personalized Privacy-aware Image Classification |
Eleftherios Spyromitros-Xioufis*, Symeon Papadopoulos*, Adrian Popescu' and Yiannis Kompatsiaris* (*CERTH ITI, 'CEA LIST) | |
___- | |
222 | The science and detection of tilting |
Xingjie Wei (University of Cambridge), Jussi Palomaki (Newcastle University) and Jeff Yan (University of Lancaster) | |
___- | |
224 | Using Photos as Micro-Reports of Events |
Siripen Pongpaichet, Mengfan Tang, Laleh Jalali and Ramesh Jain (University of California, Irvine) | |
___- | |
227 | Searching for Audio by Sketching Mental Images of Sound – A Brave New Idea for Audio Retrieval in Creative Music Production |
Peter Knees (Johannes Kepler University) and Kristina Andersen (STEIM) | |
___- |
56 | The LFM-1b Dataset for Music Retrieval and Recommendation |
Markus Schedl (Johannes Kepler University Linz) | |
___- | |
101 | Foreground Object Sensing for Saliency Detection |
Hengliang Zhu*, Jiao Jiang*, Xiao Lin', Yangyang Hao* and Lizhuang Ma (*Shanghai Jiao Tong University, 'Shanghai Normal University) | |
___- | |
102 | Constrained Local Enhancement of Semantic Features by Content-Based Sparsity |
Youssef Tamaazousti, Hervé Le Borgne and Adrian Popescu (CEA List) | |
___- | |
167 | Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts |
Yi-Jie Lu*, Hao Zhang*, Maaike de Boer' and Chong-Wah Ngo* (*City University of Hong Kong, 'TNO/Radboud University Nijmegen) | |
___- |
25 | Homemade TS-Net for Automatic Face Recognition |
Shilun Lin, Zhicheng Zhao and Fei Su (Beijing University of Posts and Telecommunications) | |
___- | |
38 | Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation |
Qing Li*, Zhaofan Qiu*, Ting Yao', Tao Mei', Yong Rui' and Jiebo Luo^ (*University of Science and Technology of China, 'Microsoft Research, ^University of Rochester) | |
___- | |
73 | Pooling Objects for Recognizing Scenes without Examples |
Svetlana Kordumova,Thomas Mensink and Cees G.M. Snoek (University of Amsterdam) | |
___- | |
193 | Multilingual Visual Sentiment Concept Matching |
Nikolaos Pappas*, Miriam Redi', Mercan Topkara+, Brendan Jou^, Hongyi Liu^, Tao Chen^ and Shih-Fu Chang^ (*Idiap Research Institute, 'Yahoo! Labs London, +JW Player, ^Columbia University) | |
___- |
80 | A Short Survey of Recent Advances in Graph Matching |
Junchi Yan*, Xucheng Yin', Weiyao Lin^, Cheng Deng+, Hongyuan Zha- and Xiaokang Yang^ (*East China Normal University, 'University of Science and Technology Beijing, ^Shanghai Jiao Tong University, +Xidian University, -Georgia Institute of Technology) | |
___- | |
107 | The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection |
Pascal Mettes, Dennis Koelma and Cees Snoek (University of Amsterdam) | |
___- | |
198 | Learning for Traffic State Estimation on Large Scale of Incomplete Data |
Yiyang Yao (Information & Communication Branch, State Grid Zhejiang Company); Yingjie Xia (Zhejiang University); Zhenyu Shan (Hangzhou Normal University); Zhengguang Liu (National University of Singapore) | |
___- |
12 | Diverse Yet Efficient Retrieval using Locality Sensitive Hashing |
Vidyadhar Rao', Prateek Jain* and C V Jawahar' ('IIIT Hyderabad, *Microsoft Research) | |
___- | |
29 | Correlation Autoencoder Hashing for Supervised Cross-Modal Se |
Yue Cao, Mingsheng Long, Jianmin Wang and Han Zhu (Tsinghua University) | |
___- | |
49 | Regional Subspace Projection Coding for Image Retrieval |
Mingmin Zhen, Wenmin Wang and Ronggang Wang (Peking University) | |
___- | |
109 | Scaling Group Testing Similarity Search |
Ahmet Iscen', Laurent Amsaleg* and Teddy Furon' ('INRIA, *CNRS-IRISA) | |
___- |
221 | Multimodal Analysis of User-Generated Content in Support of Social Media Applications |
Rajiv Shah (National University of Singapore) | |
___- | |
228 | Multimodal Visual Pattern Mining with Convolutional Neural Network |
Hongzhi Li (Columbia University) | |
___- | |
229 | Facial Landmark Detection and Tracking for Facial Behavior Analysis |
Yue Wu (Rensselaer Polytechnic Institute) | |
___- |
19 | Vinereactor: Crowdsourced Spontaneous Facial Expression Data |
Edward Kim and Shruthika Vangala (Villanova University) | |
___- | |
23 | Mirroring Facial Expressions: Evidence from Visual Analysis of Dyadic Interactions |
Yuchi Huang and Saad Khan (Educational Testing Service) | |
___- | |
42 | Sequential Correspondence Hierarchical Dirichlet Processes for Video Data Analysis |
Jianfei Xue and Koji Eguchi (Kobe University) | |
___- | |
46 | A Computational Approach to Finding Facial Patterns of a Babyface |
Zi-Yi Ke and Mei-Chen Yeh (National Taiwan Normal University) | |
___- | |
51 | Video Description Generation Using Audio and Visual Cues |
Qin Jin (Renmin University of China) and Junwei Liang (Carnegie Mellon University) | |
___- | |
52 | Contextual Media Retrieval Using Natural Language Queries |
Sreyasi Nag Chowdhury, Mateusz Malinowski, Andreas Bulling and Mario Fritz (Max Planck Institute for Informatics) | |
___- | |
60 | Learning Music Embedding with Metadata for Context Aware Recommendation |
Dongjing Wang', Shuiguang Deng', Xin Zhang* and Guandong Xu^ ('ZheJiang University, *Shandong University, ^University of Technology Sydney) | |
___- | |
62 | Region Trajectories for Video Semantic Concept Detection |
Yuancheng Ye', Xuejian Rong*, Xiaodong Yang^ and Yingli Tian* ('The Graduate Center CUNY, *The City College CUNY, ^NVIDIA Research) | |
___- | |
63 | Audiovisual Summarization of Lectures and Meetings Using a Segment Similarity Graph |
Chidansh Bhatt', Andrei Popescu-Belis* and Matthew Cooper' ('FX Palo Alto Laboratory Inc., *Idiap Research Institute) | |
___- | |
68 | Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection |
Yun Wang and Florian Metze (Carnegie Mellon University) | |
___- | |
76 | Adding Chinese Captions to Images |
Xirong Li', Weiyu Lan', Jianfeng Dong* and Hailong Liu^ ('Renmin University of China, *Zhejiang University, ^Tencent) | |
___- | |
85 | Emotion Recognition from EEG Signals Enhanced by User's Profile |
Tanfang Chen, Shangfei Wang, Zhen Gao and Chongliang Wu (USTC) | |
___- | |
88 | Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition |
Zhang Shiqing*, Shiliang Zhang', Tiejun Huang' and Wen Gao' (*Taizhou University, 'Peking University) | |
___- | |
92 | Large-Scale E-Commerce Image Retrieval with Top-Weighted Convolutional Neural Networks |
Shichao Zhao, Youjiang Xu and Yahong Han (Tianjin University) | |
___- | |
94 | Web Video Popularity Prediction using Sentiment and Content Visual Features |
Giulia Fontanini, Marco Bertini and Alberto Del Bimbo (Universita degli Studi di Firenze) | |
___- | |
111 | Accurate Aggregation of Local Features by using K-sparse Autoencoder for 3D Model Retrieval |
Takahiko Furuya and Ryutarou Ohbuchi (University of Yamanashi) | |
___- | |
114 | Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework |
Venkatesh N Murthy', Avinash Sharma*, Visesh Chari* and R. Manmatha' ('University of Massachusetts Amherst, *International Institute of Information Technology Hyderabad) | |
___- | |
118 | Discriminant Cross-modal Hashing |
Xing Xu', Fumin Shen', Yang Yang' and Heng Tao Shen* ('University of Electronic Science and Technology of China, *The University of Queensland) | |
___- | |
123 | CNN-based Style Vector for Style Image Retrieval |
Shin Matsuo and Keiji Yanai (University of Electro-Communications) | |
___- | |
128 | MVC: A Dataset for View-Invariant Clothing Retrieval and Attribute Prediction |
Kuan-Hsien Liu, Ting-Yen Chen and Chu-Song Chen (Academia Sinica) | |
___- | |
135 | A Quality Adaptive Multimodal Affect Recognition System for User-Centric Multimedia Indexing |
Rishabh Gupta', Mojtaba Khomami Abadi^*, Fabio Morreale*, Jesus Cardenes Cabre^, Tiago H. Falk' and Nicu Sebe* ('INRS-EMT University of Quebec, ^SensAura Tech, *University of Trento) | |
___- | |
140 | Rank Diffusion for Context-Based Image Retrieval |
Daniel Carlos Guimarães Pedronette (State University of São Paulo UNESP) and Ricardo Da Silva Torres (University of Campinas UNICAMP) | |
___- | |
153 | Bags of Local Convolutional Features for Scalable Instance Search |
Eva Mohedano*, Amaia Salvador', Kevin McGuinnes*, Xavier Giro-I-Nieto', Noel O'Connor* and Ferran Marques' (*Insight Center for Data Analytics, 'Universitat Politecnica de Catalunya) | |
___- | |
158 | Interactive Multimodal Learning on 100 Million Images |
Jan Zahálka', Stevan Rudinac', Björn Þór Jónsson*, Dennis C. Koelma' and Marcel Worring' ('University of Amsterdam, *Reykjavik University) | |
___- | |
162 | Combining Holistic and Part-based Deep Representations for Computational Painting Categorization |
Rao Muhammad Anwer', Fahad Shahbaz Khan^, Joost van de Weijer* and Jorma Laaksonen' ('Aalto University, ^Linkoping University, *CVC Barcelona) | |
___- | |
173 | Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications |
Vedran Vukotic'*, Christian Raymond'* and Guillaume Gravier'^ (INRIA/IRISA Rennes', *INSA Rennes, ^CNRS) | |
___- | |
176 | SSD Technology Enables Dynamic Maintenance of Persistent High-Dimensional Indexes |
Bjorn Thor Jonsson (Reykjavik University), Laurent Amsaleg (CNRS-IRISA) and Herwig Lejsek (Videntifier Technologies) | |
___- | |
177 | Item-Based Video Recommendation: an Hybrid Approach considering Human Factors |
Andrea Ferracani, Daniele Pezzatini, Marco Bertini and Alberto Del Bimbo (Universita degli Studi di Firenze) | |
___- | |
178 | Human’s Scene Sketch Understanding |
Yuxiang Ye', Yijuan Lu' and Hao Jiang* ('Texas State University, *Boston College) | |
___- | |
182 | Retrieval of Multimedia objects by Fusing Multiple Modalities |
Ilias Gialampoukidis, Anastasia Moumtzidou, Theodora Tsikrika, Stefanos Vrochidis and Yiannis Kompatsiaris (CERTH - ITI) | |
___- | |
189 | Incremental Learning for Fine-Grained Image Recognition |
Liangliang Cao', Jenhao Hsiao', Paloma de Juan', Yuncheng Li* and Bart Thomee' ('Yahoo Labs, *University of Rochester) | |
___- | |
194 | Spatially Localized Visual Dictionary Learning |
Valentin Leveau*, Alexis Joly', Olivier Buisson^ and Patrick Valduriez' (*French National Institute of Audiovisual Contents, 'INRIA, ^INA) | |
___- | |
201 | Semantic Binary Codes |
Sravanthi Bondugula and Larry Davis (University of Maryland College Park) | |
___- | |
209 | On the Effects of Spam Filtering and Incremental Learning for Web-Supervised Visual Concept Classification |
Matthias Springstein (TIB Hannover) and Ralph Ewerth (TIB, Leibniz Universität Hannover) | |
___- | |
220 | Semi-supervised Identification of Rarely Appearing Persons in Video by Correcting Weak Labels |
Eric Müller'^, Christian Otto' and Ralph Ewerth^* ('Ernst-Abbe-Hochschule Jena, ^TIB Hannover *Leibniz Universität Hannover) | |
___- | |
230 | Introducing Concept And Syntax Transition Networks for Image Captioning |
Philipp Blandfort', Tushar Karayil', Damian Borth* and Andreas Dengel*('University of Kaiserslautern, *German Institute of Artificial Intelligence-DFKI) | |
___- |
32 | SentiCart: Cartography and Geo-contextualization for Multilingual Visual Sentiment |
Brendan Jou, Margaret Yuying Qian and Shih-Fu Chang (Columbia University) | |
___- | |
74 | Personalized Retrieval and Browsing of Classical Music and Supporting Multimedia Material |
Marko Tkalcic*, Markus Schedl^, Cynthia Liem' and Mark Melenhorst' (*Free University of Bolzano, ^Johannes Kepler University, 'TU Delft) | |
___- | |
90 | The Social Picture |
Sebastiano Battiato', Giovanni Maria Farinella', Filippo Luigi Maria Milotta', Alessandro Ortis', Luca Addesso*, Antonino Casella*, Valeria D'Amico* and Giovanni Torrisi* ('University of Catania, *Telecom Italia) | |
___- | |
108 | Watching What and How Politicians Discuss Various Topics - A Large-Scale Video Analytics UI |
Emily Song, Joseph Ellis, Hongzhi Li and Shih-Fu Chang (Columbia University) | |
___- | |
142 | Multimodal Event Detection and Summarization in Large Scale Image Collections |
Manos Schinas', Symeon Papadopoulos', Georgios Petkos', Yiannis Kompatsiaris' and Pericles Mitkas* ('CERTH-ITI, *Aristotle University of Thessaloniki) | |
___- | |
143 | Object-aware Deep Network for Commodity Image Retrieval |
Zhiwei Fang', Jin Liu', Yuhang Wang', Yong Li', Jinhui Tang*, Hanqing Lu and Hang Song ('Institute of Automation Chinese Academy of Sciences, *Nanjing University of Science and Technoligy) | |
___- | |
152 | An Automated End-To-End Pipeline for Fine-Grained Video Annotation using Deep Neural Networks |
Baptist Vandersmissen, Lucas Sterckx, Thomas Demeester, Azarakhsh Jalalvand and Wesley De Neve (Ghent University - iMinds - Data Science Lab); Rik Van de Walle (Ghent University - iMinds - Data Science Lab) | |
___- | |
175 | Serendipity-driven Celebrity Video Hyperlinking |
Shu-Jun Yang', Lei Pang', Chong-Wah Ngo' and Benoit Huet* ('City University of Hong Kong, *EURECOM) | |
___- | |
207 | Complura: Exploring and Leveraging a Large-scale Multilingual Visual Sentiment Ontology |
Hongyi Liu', Brendan Jou', Tao Chen', Mercan Topkara+, Nikolaos Pappas^, Miriam Redi* and Shih-Fu Chang' ('Columbia University, +JW Player, ^Idiap Research Institute and EPFL, *Yahoo! Labs) | |
___- |
I will give an overview of some of the work done by the Video Content Analysis team at Google Research in the context of YouTube and Google Photos. I will show examples of features and use cases enabled or assisted by image and video analysis, and discuss in more detail how we have approached the problem of extracting meaning out of audio-visual signals at massive scales. I'll conclude with a note on data sets, and suggestions of open problem areas with large potential for impact.
Tomáš Ižo leads the Video Content Analysis team in the Machine Perception group at Google Research. The team’s mission is to improve Google products like YouTube and Photos by making them more content-aware via machine learning and perception techniques. Tomáš came to Google by way of MIT, where he received a Ph.D. in computer science in 2007. His research focused on motion and scene analysis. At Google, he has contributed to many areas of video technology, from summarization, enhancement and creative tools to categorization, annotation and infrastructure for media processing.
This talk provides a brief overview of deep learning research, the challenges involved in scaling it up across multi-GPU and multi-machine clusters, while providing software that is flexible enough for research settings. We discuss the clear trends that are emerging in deep learning from a HPC perspective and discuss several examples from our work at Facebook AI Research.
Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. He holds a Masters in CS from NYU.
This talk will cover a broad range of topics from how successful computer vision systems are built using neural networks, why there is an explosion in real world applications powered by these systems, to how Clarifai makes it easy for you to build a new generation of intelligent applications. We will discuss the algorithms that were created in the 1980s and have evolved to power real world applications across every industry. Typically thought of as black box algorithms we will dive into a visualization technique which demonstrates what these models see at their various levels of abstraction. Then multiple applications enabled by this technology in Clarifai's API will be demonstrated, including a free photo organization product for consumers called Forevery, developer projects and enterprise solutions.
Matthew Zeiler is an expert in the field of neural networks and Founder and CEO of Clarifai. After having learned from pioneers of neural networks including Geoff Hinton and Yann LeCun he started Clarifai in November 2013 upon completion of his PhD from New York University. He set out with the mission for Clarifai to understand every image and video to improve life; bringing the power of AI to everyone.
Microsoft Cognitive Services is a collection of cloud APIs that are available to developers to make their applications more intelligent and engaging. In this talk, we will examine the Microsoft Cognitive Service, with particular attention to the celebrity API, which identify the celebrities in the image, and the image caption API, which provides a natural language description of the content of the image. We will describe the research work that powers the services, and the related challenges and the quests to expand the coverage and improve the quality of the services.
Dr. Jin Li is a Partner Research Manager of the Cloud Computing and Storage group at MSR Technologies. His team has made great contributions to Microsoft in the order of hundreds of millions dollars per annum. His contributions include the local reconstruction code (LRC) in Azure and Windows Server, the erasure code used in Lync, Xbox and RemoteFX, the Data Deduplication feature in Windows Server 2012, the high performance SSD based key-value store in Bing, and the RemoteFX for WAN feature in Windows 8 and Windows Server 2012. He has won a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. He has served as the lead Program Chair of ICME 2011, ICME Steering Committee Chair and a Program Co-Chair of ACM Multimedia 2016. He is an IEEE Fellow.
Watson vision is a set of APIs that are harnessing the power of sophisticated deep learning to help businesses use visual data to drive new businesses. We will illustrate how companies have created signifcant shareholder value by use of these APIs will small engineering teams
Rahul Singhal is a Watson Product Leader for its suite of Image, Speech and Text products. He has over 20 years of experience in the software industry. He frequently speaks at variety of conferences and is a sought after expert on AI and Machine Learning. In his spare time, he is found spending time with startups helping and learning from them