Program ICMR 2016

Conference Program PDF

Download

Venues	Schapiro CEPSR Room 750		International Affairs- Altschul Auditorium	International Affairs- Altschul Auditorium		International Affairs- Altschul Auditorium
	Schapiro CEPSR Room 750	Schapiro CEPSR Room 415		Mudd Carleton Lounge	International Affairs- Altschul Auditorium
Date	June 6 Monday		June 7 Tuesday	June 8 Wednesday		June 9 Thursday
	Tutorial & Workshop		Main Conference	Main Conference		Industry Day
08:20am	Registration		Registration	Registration		Registration
08:40am			Welcome and Introduction	Oral: Best Paper Candidates		Invited Industry Talks: Video and Social Media
09:00am	1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction (MARMI) p1		Keynote Talk: Professor Shih-Fu Chang
09:20am
09:40am
10:00am			Coffee Break			Coffee Break
10:20am			Oral 1: Deep Learning and Applications	Coffee Break		Invited Industry Talks: Visual Recognition API and Services
10:40am	Coffee Break			Special Oral Session
11:00am	1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction (MARMI) p2
11:20am
11:40am
12:00pm			Lunch (Mudd Carleton Lounge)	Lunch (Mudd Carleton Lounge)		Lunch (Mudd Carleton Lounge)
12:20pm
12:40pm
01:00pm	Lunch (Schapiro CEPSR Ground Floor)		Oral 2: Image and Video Content Analysis	Oral 4: Image and Video Search		Industry Panel
01:20pm
01:40pm
02:00pm	Tutorial 1 p1	Tutorial 2 p1				Conference Close
02:20pm
02:40pm			Coffee Break	Coffee Break		ACM Multimedia 2016 TPC Workshop
03:00pm			Oral: Brave New Ideas	Posters and Demos (optional setup, if needed)	Student Symposium
03:20pm
03:40pm	Coffee Break
04:00pm	Tutorial 1 p2	Tutorial 2 p2		Posters and Demos
04:20pm
04:40pm			Coffee Break
05:00pm			Oral 3: Multimedia Datasets and Applications
05:20pm
05:40pm
06:00pm	Welcome Reception Avery Plaza (click for map)
06:20pm
06:40pm
7:00pm --late				Social Dinner @ Calle Ocho (click for directions)

KEYNOTE: New Frontiers of Large Scale Multimedia Information Retrieval

Tuesday June 7, 9:00am @ International Affairs - Altschul Auditorium

KEYNOTE SPEAKER

Shih-Fu Chang

Columbia University

Shih-Fu Chang is the Sr. Executive Vice Dean and the Richard Dicker Professor of Columbia Engineering. His research is focused on multimedia information retrieval, computer vision, machine learning, and signal procesing, with the goal to turn unstructured multimedia data into searchable information. His work on content based visual search in the early 90's, VisualSEEk and VideoQ, set the foundation of this vibrant area. Over the years, he continued to create innovative techniques for image/video recognition, multimodal analysis, visual information ontology, image authentication, and compact hashing for large-scale image databases. For his long-term pioneering contributions, he has been awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia SIG Technical Acheivement Award, Honorary Doctorate from the University of Amsterdam, the IEEE Kiyo Tomiyasu Award, and IBM Faculty Award. For his dedicated contributions to education, he received the Great Teacher Award from the Society of Columbia Graduates. He served as Chair of Columbia Electrical Engineering Department (2007-2010), the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), and advisor for several research institutions and companies. In his current capacity in Columbia Engineering, he plays a key role in the School's strategic planning, special research initiatives, international collaboration, and faculty development. He is a Fellow of the American Association for the Avancement of Science (AAAS) and IEEE.

Webpage
Multimedia information retrieval aims to automatically extract useful information from large collection of images, videos, and combinations with other data like text and speech. As reported in recent news, it's now possible to search information over millions or more of products with just an example image on the mobile phone. Intelligent apps are being deployed by major companies to automatically generate keywords or even captions of an image at a sophistication level that could not be imagined before. In this talk, I will review core technologies involved and discuss challenges and opportunities ahead. First, to address the complexity bottleneck when scaling up the data size, I will present extremely compact hash codes and deep learning image classification models that can reduce complexity by orders of magnitude while preserving approximate accuracy. Second, to support easy extension of recognition systems to new domains, instead of relying on fixed image categories, we introduce a new paradigm to automatically discover unique multimodal concepts and structures using large amounts of multimedia data available. Last, to support emerging applications beyond basic image categorization, I will discuss ongoing efforts in uderstanding how images are used in expressing sentiments and emotions in online social media and how languages/cultures may influence such online multimedia communication.

ORAL SESSION 1: Deep Learning and Applications

Tuesday June 7, 10:20am @ International Affairs - Altschul Auditorium

Session Chair: Miriam Redi

45	Matching User Photos to Online Products with Robust Deep Features
	Xi Wang, Zhenfeng Sun, Wenqiang Zhang and Yu-Gang Jiang (Fudan University)
___-
72	Video Emotion Recognition with Transferred Deep Feature Encodings
	Baohan Xu, Yanwei Fu', Yu-Gang Jiang, Boyang Li' and Leonid Sigal' (*Fudan university, 'Disney Research Pittsburgh)
___-
137	Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features
	Lorenzo Baraldi, Costantino Grana and Rita Cucchiara (University of Modena and Reggio Emilia)
___-
151	ACD: Action Concept Discovery from Image-Sentence Corpora
	Jiyang Gao, Chen Sun and Ram Nevatia (University of Southern California)
___-

ORAL SESSION 2: Image and Video Content Analysis

Tuesday June 7, 1:00pm @ International Affairs - Altschul Auditorium

Session Chair: Chua Tat Seng

8	GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring
	Wenjing Ma, Liangliang Cao' and Lei Yu (*Institute of Software Chinese Academy of Sciences, 'Yahoo! Labs)
___-
61	Mouse Activity as an Indicator of Interestingness in Video
	Gloria Zen, Yale Song', Paloma de Juan' and Alejandro Jaimes' (University of Trento, ^Yahoo! Labs, 'AiCure)
___-
132	Automatic Identification of Sports Video Highlights using Viewer Interest Features
	Prithwi Raj Chakraborty', Ligang Zhang, Dian Tjondronegoro' and Vinod Chandran' ('Queensland University of Technology, Xi'an University of Technology)
___-
148	Diverse Concept-Level Features for Multi-Object Classification
	Youssef Tamaazousti, Hervé Le Borgne and Céline Hudelot' (*CEA List, 'Centrale-Supélec)
___-

ORAL SESSION: Brave New Ideas

Tuesday June 7, 3:00pm @ International Affairs - Altschul Auditorium

Session Chair: Hui Wu

47	Personalized Privacy-aware Image Classification
	Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Adrian Popescu' and Yiannis Kompatsiaris* (*CERTH ITI, 'CEA LIST)
___-
222	The science and detection of tilting
	Xingjie Wei (University of Cambridge), Jussi Palomaki (Newcastle University) and Jeff Yan (University of Lancaster)
___-
224	Using Photos as Micro-Reports of Events
	Siripen Pongpaichet, Mengfan Tang, Laleh Jalali and Ramesh Jain (University of California, Irvine)
___-
227	Searching for Audio by Sketching Mental Images of Sound – A Brave New Idea for Audio Retrieval in Creative Music Production
	Peter Knees (Johannes Kepler University) and Kristina Andersen (STEIM)
___-

ORAL SESSION 3: Multimedia Datasets and Applications

Tuesday June 7, 5:00pm @ International Affairs - Altschul Auditorium

Session Chair: Shin'ichi Satoh

56	The LFM-1b Dataset for Music Retrieval and Recommendation
	Markus Schedl (Johannes Kepler University Linz)
___-
101	Foreground Object Sensing for Saliency Detection
	Hengliang Zhu, Jiao Jiang, Xiao Lin', Yangyang Hao* and Lizhuang Ma (*Shanghai Jiao Tong University, 'Shanghai Normal University)
___-
102	Constrained Local Enhancement of Semantic Features by Content-Based Sparsity
	Youssef Tamaazousti, Hervé Le Borgne and Adrian Popescu (CEA List)
___-
167	Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts
	Yi-Jie Lu, Hao Zhang, Maaike de Boer' and Chong-Wah Ngo* (*City University of Hong Kong, 'TNO/Radboud University Nijmegen)
___-

ORAL SESSION: Best Paper Candidates

Wednesday June 8, 8:40am @ International Affairs - Altschul Auditorium

Session Chairs: Susanne Boll and Winston Hsu

25	Homemade TS-Net for Automatic Face Recognition
	Shilun Lin, Zhicheng Zhao and Fei Su (Beijing University of Posts and Telecommunications)
___-
38	Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation
	Qing Li, Zhaofan Qiu, Ting Yao', Tao Mei', Yong Rui' and Jiebo Luo^ (*University of Science and Technology of China, 'Microsoft Research, ^University of Rochester)
___-
73	Pooling Objects for Recognizing Scenes without Examples
	Svetlana Kordumova,Thomas Mensink and Cees G.M. Snoek (University of Amsterdam)
___-
193	Multilingual Visual Sentiment Concept Matching
	Nikolaos Pappas, Miriam Redi', Mercan Topkara+, Brendan Jou^, Hongyi Liu^, Tao Chen^ and Shih-Fu Chang^ (Idiap Research Institute, 'Yahoo! Labs London, +JW Player, ^Columbia University)
___-

SPECIAL SESSION: Learning with Semantic Information for Large Scale Multimedia Understanding

Wednesday June 8, 10:40am @ International Affairs - Altschul Auditorium

Session Chair: Nicu Sebe

80	A Short Survey of Recent Advances in Graph Matching
	Junchi Yan, Xucheng Yin', Weiyao Lin^, Cheng Deng+, Hongyuan Zha- and Xiaokang Yang^ (East China Normal University, 'University of Science and Technology Beijing, ^Shanghai Jiao Tong University, +Xidian University, -Georgia Institute of Technology)
___-
107	The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection
	Pascal Mettes, Dennis Koelma and Cees Snoek (University of Amsterdam)
___-
198	Learning for Traffic State Estimation on Large Scale of Incomplete Data
	Yiyang Yao (Information & Communication Branch, State Grid Zhejiang Company); Yingjie Xia (Zhejiang University); Zhenyu Shan (Hangzhou Normal University); Zhengguang Liu (National University of Singapore)
___-

ORAL SESSION 4: Image and Video Search

Wednesday June 8, 1:00pm @ International Affairs - Altschul Auditorium

Session Chair: Liangliang Cao

12	Diverse Yet Efficient Retrieval using Locality Sensitive Hashing
	Vidyadhar Rao', Prateek Jain* and C V Jawahar' ('IIIT Hyderabad, *Microsoft Research)
___-
29	Correlation Autoencoder Hashing for Supervised Cross-Modal Se
	Yue Cao, Mingsheng Long, Jianmin Wang and Han Zhu (Tsinghua University)
___-
49	Regional Subspace Projection Coding for Image Retrieval
	Mingmin Zhen, Wenmin Wang and Ronggang Wang (Peking University)
___-
109	Scaling Group Testing Similarity Search
	Ahmet Iscen', Laurent Amsaleg* and Teddy Furon' ('INRIA, *CNRS-IRISA)
___-

ORAL SESSION: Student Symposium

Wednesday June 8, 3:00pm @ International Affairs - Altschul Auditorium

Session Chair: Yao Wang

221	Multimodal Analysis of User-Generated Content in Support of Social Media Applications
	Rajiv Shah (National University of Singapore)
___-
228	Multimodal Visual Pattern Mining with Convolutional Neural Network
	Hongzhi Li (Columbia University)
___-
229	Facial Landmark Detection and Tracking for Facial Behavior Analysis
	Yue Wu (Rensselaer Polytechnic Institute)
___-

POSTER SESSION

Wednesday June 8, 3:00pm @ Mudd - Carleton Lounge

Session Chair: Mei-Chen Yeh

19	Vinereactor: Crowdsourced Spontaneous Facial Expression Data
	Edward Kim and Shruthika Vangala (Villanova University)
___-
23	Mirroring Facial Expressions: Evidence from Visual Analysis of Dyadic Interactions
	Yuchi Huang and Saad Khan (Educational Testing Service)
___-
42	Sequential Correspondence Hierarchical Dirichlet Processes for Video Data Analysis
	Jianfei Xue and Koji Eguchi (Kobe University)
___-
46	A Computational Approach to Finding Facial Patterns of a Babyface
	Zi-Yi Ke and Mei-Chen Yeh (National Taiwan Normal University)
___-
51	Video Description Generation Using Audio and Visual Cues
	Qin Jin (Renmin University of China) and Junwei Liang (Carnegie Mellon University)
___-
52	Contextual Media Retrieval Using Natural Language Queries
	Sreyasi Nag Chowdhury, Mateusz Malinowski, Andreas Bulling and Mario Fritz (Max Planck Institute for Informatics)
___-
60	Learning Music Embedding with Metadata for Context Aware Recommendation
	Dongjing Wang', Shuiguang Deng', Xin Zhang* and Guandong Xu^ ('ZheJiang University, *Shandong University, ^University of Technology Sydney)
___-
62	Region Trajectories for Video Semantic Concept Detection
	Yuancheng Ye', Xuejian Rong, Xiaodong Yang^ and Yingli Tian ('The Graduate Center CUNY, *The City College CUNY, ^NVIDIA Research)
___-
63	Audiovisual Summarization of Lectures and Meetings Using a Segment Similarity Graph
	Chidansh Bhatt', Andrei Popescu-Belis* and Matthew Cooper' ('FX Palo Alto Laboratory Inc., *Idiap Research Institute)
___-
68	Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection
	Yun Wang and Florian Metze (Carnegie Mellon University)
___-
76	Adding Chinese Captions to Images
	Xirong Li', Weiyu Lan', Jianfeng Dong* and Hailong Liu^ ('Renmin University of China, *Zhejiang University, ^Tencent)
___-
85	Emotion Recognition from EEG Signals Enhanced by User's Profile
	Tanfang Chen, Shangfei Wang, Zhen Gao and Chongliang Wu (USTC)
___-
88	Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
	Zhang Shiqing, Shiliang Zhang', Tiejun Huang' and Wen Gao' (Taizhou University, 'Peking University)
___-
92	Large-Scale E-Commerce Image Retrieval with Top-Weighted Convolutional Neural Networks
	Shichao Zhao, Youjiang Xu and Yahong Han (Tianjin University)
___-
94	Web Video Popularity Prediction using Sentiment and Content Visual Features
	Giulia Fontanini, Marco Bertini and Alberto Del Bimbo (Universita degli Studi di Firenze)
___-
111	Accurate Aggregation of Local Features by using K-sparse Autoencoder for 3D Model Retrieval
	Takahiko Furuya and Ryutarou Ohbuchi (University of Yamanashi)
___-
114	Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework
	Venkatesh N Murthy', Avinash Sharma, Visesh Chari and R. Manmatha' ('University of Massachusetts Amherst, *International Institute of Information Technology Hyderabad)
___-
118	Discriminant Cross-modal Hashing
	Xing Xu', Fumin Shen', Yang Yang' and Heng Tao Shen* ('University of Electronic Science and Technology of China, *The University of Queensland)
___-
123	CNN-based Style Vector for Style Image Retrieval
	Shin Matsuo and Keiji Yanai (University of Electro-Communications)
___-
128	MVC: A Dataset for View-Invariant Clothing Retrieval and Attribute Prediction
	Kuan-Hsien Liu, Ting-Yen Chen and Chu-Song Chen (Academia Sinica)
___-
135	A Quality Adaptive Multimodal Affect Recognition System for User-Centric Multimedia Indexing
	Rishabh Gupta', Mojtaba Khomami Abadi^, Fabio Morreale, Jesus Cardenes Cabre^, Tiago H. Falk' and Nicu Sebe* ('INRS-EMT University of Quebec, ^SensAura Tech, *University of Trento)
___-
140	Rank Diffusion for Context-Based Image Retrieval
	Daniel Carlos Guimarães Pedronette (State University of São Paulo UNESP) and Ricardo Da Silva Torres (University of Campinas UNICAMP)
___-
153	Bags of Local Convolutional Features for Scalable Instance Search
	Eva Mohedano, Amaia Salvador', Kevin McGuinnes, Xavier Giro-I-Nieto', Noel O'Connor* and Ferran Marques' (*Insight Center for Data Analytics, 'Universitat Politecnica de Catalunya)
___-
158	Interactive Multimodal Learning on 100 Million Images
	Jan Zahálka', Stevan Rudinac', Björn Þór Jónsson, Dennis C. Koelma' and Marcel Worring' ('University of Amsterdam, Reykjavik University)
___-
162	Combining Holistic and Part-based Deep Representations for Computational Painting Categorization
	Rao Muhammad Anwer', Fahad Shahbaz Khan^, Joost van de Weijer* and Jorma Laaksonen' ('Aalto University, ^Linkoping University, *CVC Barcelona)
___-
173	Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
	Vedran Vukotic', Christian Raymond' and Guillaume Gravier'^ (INRIA/IRISA Rennes', *INSA Rennes, ^CNRS)
___-
176	SSD Technology Enables Dynamic Maintenance of Persistent High-Dimensional Indexes
	Bjorn Thor Jonsson (Reykjavik University), Laurent Amsaleg (CNRS-IRISA) and Herwig Lejsek (Videntifier Technologies)
___-
177	Item-Based Video Recommendation: an Hybrid Approach considering Human Factors
	Andrea Ferracani, Daniele Pezzatini, Marco Bertini and Alberto Del Bimbo (Universita degli Studi di Firenze)
___-
178	Human’s Scene Sketch Understanding
	Yuxiang Ye', Yijuan Lu' and Hao Jiang* ('Texas State University, *Boston College)
___-
182	Retrieval of Multimedia objects by Fusing Multiple Modalities
	Ilias Gialampoukidis, Anastasia Moumtzidou, Theodora Tsikrika, Stefanos Vrochidis and Yiannis Kompatsiaris (CERTH - ITI)
___-
189	Incremental Learning for Fine-Grained Image Recognition
	Liangliang Cao', Jenhao Hsiao', Paloma de Juan', Yuncheng Li* and Bart Thomee' ('Yahoo Labs, *University of Rochester)
___-
194	Spatially Localized Visual Dictionary Learning
	Valentin Leveau, Alexis Joly', Olivier Buisson^ and Patrick Valduriez' (French National Institute of Audiovisual Contents, 'INRIA, ^INA)
___-
201	Semantic Binary Codes
	Sravanthi Bondugula and Larry Davis (University of Maryland College Park)
___-
209	On the Effects of Spam Filtering and Incremental Learning for Web-Supervised Visual Concept Classification
	Matthias Springstein (TIB Hannover) and Ralph Ewerth (TIB, Leibniz Universität Hannover)
___-
220	Semi-supervised Identiﬁcation of Rarely Appearing Persons in Video by Correcting Weak Labels
	Eric Müller'^, Christian Otto' and Ralph Ewerth^* ('Ernst-Abbe-Hochschule Jena, ^TIB Hannover *Leibniz Universität Hannover)
___-
230	Introducing Concept And Syntax Transition Networks for Image Captioning
	Philipp Blandfort', Tushar Karayil', Damian Borth* and Andreas Dengel('University of Kaiserslautern, German Institute of Artificial Intelligence-DFKI)
___-

DEMO SESSION

Wednesday June 8, 3:00pm @ Mudd - Carleton Lounge

Session Chair: Bart Thomee

32	SentiCart: Cartography and Geo-contextualization for Multilingual Visual Sentiment
	Brendan Jou, Margaret Yuying Qian and Shih-Fu Chang (Columbia University)
___-
74	Personalized Retrieval and Browsing of Classical Music and Supporting Multimedia Material
	Marko Tkalcic, Markus Schedl^, Cynthia Liem' and Mark Melenhorst' (Free University of Bolzano, ^Johannes Kepler University, 'TU Delft)
___-
90	The Social Picture
	Sebastiano Battiato', Giovanni Maria Farinella', Filippo Luigi Maria Milotta', Alessandro Ortis', Luca Addesso, Antonino Casella, Valeria D'Amico* and Giovanni Torrisi* ('University of Catania, *Telecom Italia)
___-
108	Watching What and How Politicians Discuss Various Topics - A Large-Scale Video Analytics UI
	Emily Song, Joseph Ellis, Hongzhi Li and Shih-Fu Chang (Columbia University)
___-
142	Multimodal Event Detection and Summarization in Large Scale Image Collections
	Manos Schinas', Symeon Papadopoulos', Georgios Petkos', Yiannis Kompatsiaris' and Pericles Mitkas* ('CERTH-ITI, *Aristotle University of Thessaloniki)
___-
143	Object-aware Deep Network for Commodity Image Retrieval
	Zhiwei Fang', Jin Liu', Yuhang Wang', Yong Li', Jinhui Tang, Hanqing Lu and Hang Song ('Institute of Automation Chinese Academy of Sciences, Nanjing University of Science and Technoligy)
___-
152	An Automated End-To-End Pipeline for Fine-Grained Video Annotation using Deep Neural Networks
	Baptist Vandersmissen, Lucas Sterckx, Thomas Demeester, Azarakhsh Jalalvand and Wesley De Neve (Ghent University - iMinds - Data Science Lab); Rik Van de Walle (Ghent University - iMinds - Data Science Lab)
___-
175	Serendipity-driven Celebrity Video Hyperlinking
	Shu-Jun Yang', Lei Pang', Chong-Wah Ngo' and Benoit Huet* ('City University of Hong Kong, *EURECOM)
___-
207	Complura: Exploring and Leveraging a Large-scale Multilingual Visual Sentiment Ontology
	Hongyi Liu', Brendan Jou', Tao Chen', Mercan Topkara+, Nikolaos Pappas^, Miriam Redi* and Shih-Fu Chang' ('Columbia University, +JW Player, ^Idiap Research Institute and EPFL, *Yahoo! Labs)
___-

INVITED INDUSTRY TALKS: Opening Remarks

Thursday June 9, 8:40am @ International Affairs - Altschul Auditorium

Rogerio Feris

INVITED INDUSTRY TALKS: Video and Social Media

Thursday June 9, 8:50am @ International Affairs - Altschul Auditorium

Session Chair: Paul Natsev

click on picture for abstract

Teaching Machines to Watch and Listen: Video Content Analysis at YouTube and Google

Tomas Izo - 08:50am

Google

I will give an overview of some of the work done by the Video Content Analysis team at Google Research in the context of YouTube and Google Photos. I will show examples of features and use cases enabled or assisted by image and video analysis, and discuss in more detail how we have approached the problem of extracting meaning out of audio-visual signals at massive scales. I'll conclude with a note on data sets, and suggestions of open problem areas with large potential for impact.

Tomáš Ižo leads the Video Content Analysis team in the Machine Perception group at Google Research. The team’s mission is to improve Google products like YouTube and Photos by making them more content-aware via machine learning and perception techniques. Tomáš came to Google by way of MIT, where he received a Ph.D. in computer science in 2007. His research focused on motion and scene analysis. At Google, he has contributed to many areas of video technology, from summarization, enhancement and creative tools to categorization, annotation and infrastructure for media processing.

Webpage
Distributed Deep Learning at Scale Distributed Deep Learning at Scale Distributed Deep Learning at Scale

Soumith Chintala - 09:25am

Facebook AI Research

This talk provides a brief overview of deep learning research, the challenges involved in scaling it up across multi-GPU and multi-machine clusters, while providing software that is flexible enough for research settings. We discuss the clear trends that are emerging in deep learning from a HPC perspective and discuss several examples from our work at Facebook AI Research.

Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. He holds a Masters in CS from NYU.

Webpage

INVITED INDUSTRY TALKS: Visual Recognition API and Services

Thursday June 9, 10:20am @ International Affairs - Altschul Auditorium

Session Chair: Rogerio Feris

click on picture for abstract

Clarifai Neural Networks Clarifai Neural Networks

Matthew Zeiler - 10:20am

Clarifai

This talk will cover a broad range of topics from how successful computer vision systems are built using neural networks, why there is an explosion in real world applications powered by these systems, to how Clarifai makes it easy for you to build a new generation of intelligent applications. We will discuss the algorithms that were created in the 1980s and have evolved to power real world applications across every industry. Typically thought of as black box algorithms we will dive into a visualization technique which demonstrates what these models see at their various levels of abstraction. Then multiple applications enabled by this technology in Clarifai's API will be demonstrated, including a free photo organization product for consumers called Forevery, developer projects and enterprise solutions.
Matthew Zeiler is an expert in the field of neural networks and Founder and CEO of Clarifai. After having learned from pioneers of neural networks including Geoff Hinton and Yann LeCun he started Clarifai in November 2013 upon completion of his PhD from New York University. He set out with the mission for Clarifai to understand every image and video to improve life; bringing the power of AI to everyone.

Webpage
Microsoft Cognitive Service : Building Intelligent and Engaging Applications

Jin Li - 10:55am

Microsoft Research

Microsoft Cognitive Services is a collection of cloud APIs that are available to developers to make their applications more intelligent and engaging. In this talk, we will examine the Microsoft Cognitive Service, with particular attention to the celebrity API, which identify the celebrities in the image, and the image caption API, which provides a natural language description of the content of the image. We will describe the research work that powers the services, and the related challenges and the quests to expand the coverage and improve the quality of the services.

Dr. Jin Li is a Partner Research Manager of the Cloud Computing and Storage group at MSR Technologies. His team has made great contributions to Microsoft in the order of hundreds of millions dollars per annum. His contributions include the local reconstruction code (LRC) in Azure and Windows Server, the erasure code used in Lync, Xbox and RemoteFX, the Data Deduplication feature in Windows Server 2012, the high performance SSD based key-value store in Bing, and the RemoteFX for WAN feature in Windows 8 and Windows Server 2012. He has won a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. He has served as the lead Program Chair of ICME 2011, ICME Steering Committee Chair and a Program Co-Chair of ACM Multimedia 2016. He is an IEEE Fellow.

Webpage
Watson Vision - Disrupting Businesses Watson Vision - Disrupting Businesses

Rahul Singhal - 11:30am

IBM Watson

Watson vision is a set of APIs that are harnessing the power of sophisticated deep learning to help businesses use visual data to drive new businesses. We will illustrate how companies have created signifcant shareholder value by use of these APIs will small engineering teams

Rahul Singhal is a Watson Product Leader for its suite of Image, Speech and Text products. He has over 20 years of experience in the software industry. He frequently speaks at variety of conferences and is a sought after expert on AI and Machine Learning. In his spare time, he is found spending time with startups helping and learning from them

Webpage

ICMR 2016

PROGRAM OVERVIEW

Conference Program PDF

KEYNOTE: New Frontiers of Large Scale Multimedia Information Retrieval

Tuesday June 7, 9:00am @ International Affairs - Altschul Auditorium

KEYNOTE SPEAKER

Shih-Fu Chang

ORAL SESSION 1: Deep Learning and Applications

Tuesday June 7, 10:20am @ International Affairs - Altschul Auditorium

Session Chair: Miriam Redi

ORAL SESSION 2: Image and Video Content Analysis

Tuesday June 7, 1:00pm @ International Affairs - Altschul Auditorium

Session Chair: Chua Tat Seng

ORAL SESSION: Brave New Ideas

Tuesday June 7, 3:00pm @ International Affairs - Altschul Auditorium

Session Chair: Hui Wu

ORAL SESSION 3: Multimedia Datasets and Applications

Tuesday June 7, 5:00pm @ International Affairs - Altschul Auditorium

Session Chair: Shin'ichi Satoh

ORAL SESSION: Best Paper Candidates

Wednesday June 8, 8:40am @ International Affairs - Altschul Auditorium

Session Chairs: Susanne Boll and Winston Hsu

SPECIAL SESSION: Learning with Semantic Information for Large Scale Multimedia Understanding

Wednesday June 8, 10:40am @ International Affairs - Altschul Auditorium

Session Chair: Nicu Sebe

ORAL SESSION 4: Image and Video Search

Wednesday June 8, 1:00pm @ International Affairs - Altschul Auditorium

Session Chair: Liangliang Cao

ORAL SESSION: Student Symposium

Wednesday June 8, 3:00pm @ International Affairs - Altschul Auditorium

Session Chair: Yao Wang

POSTER SESSION

Wednesday June 8, 3:00pm @ Mudd - Carleton Lounge

Session Chair: Mei-Chen Yeh

DEMO SESSION

Wednesday June 8, 3:00pm @ Mudd - Carleton Lounge

Session Chair: Bart Thomee

INVITED INDUSTRY TALKS: Opening Remarks

Thursday June 9, 8:40am @ International Affairs - Altschul Auditorium

Rogerio Feris

INVITED INDUSTRY TALKS: Video and Social Media

Thursday June 9, 8:50am @ International Affairs - Altschul Auditorium

Session Chair: Paul Natsev

click on picture for abstract

Teaching Machines to Watch and Listen: Video Content Analysis at YouTube and Google

Tomas Izo - 08:50am

Distributed Deep Learning at Scale Distributed Deep Learning at Scale Distributed Deep Learning at Scale

Soumith Chintala - 09:25am

INVITED INDUSTRY TALKS: Visual Recognition API and Services

Thursday June 9, 10:20am @ International Affairs - Altschul Auditorium

Session Chair: Rogerio Feris

click on picture for abstract

Clarifai Neural Networks Clarifai Neural Networks

Matthew Zeiler - 10:20am

Microsoft Cognitive Service : Building Intelligent and Engaging Applications

Jin Li - 10:55am

Watson Vision - Disrupting Businesses Watson Vision - Disrupting Businesses

Rahul Singhal - 11:30am

INDUSTRY PANEL: How is Deep Learning Changing the Multimedia World?

Thursday June 9, 1:00pm @ International Affairs - Altschul Auditorium

Session Chairs: Alejandro Jaimes and Vincent Oria

Tomas Izo (Google)

Soumith Chintala (Facebook)

Matthew Zeiler (Clarifai)

Jin Li (Microsoft)

Rahul Singhal (IBM)