ICMR 2016


Program at a glance
Oral Session 1: Deep Learning and Applications
Oral Session 2: Image and Video Content Analysis
Oral Session: Brave New Ideas
Oral Session 3: Multimedia Datasets and Applications
Oral Session: Best Paper Candidates
Special Oral Session: Learning with Semantic Information for Large Scale Multimedia Understanding
Oral Session 4: Image/Video Search
Oral Session: Student Symposium
Poster Session
Demo Session
Invited Industry Talks
Industry Panel

Conference Program PDF


VenuesSchapiro CEPSR Room 750International Affairs- Altschul AuditoriumInternational Affairs- Altschul Auditorium International Affairs- Altschul Auditorium
Schapiro CEPSR Room 750Schapiro CEPSR Room 415 Mudd Carleton LoungeInternational Affairs- Altschul Auditorium
Date June 6 Monday June 7 Tuesday June 8 Wednesday June 9 Thursday
  Tutorial & Workshop Main Conference Main Conference Industry Day
08:20am Registration Registration Registration Registration
08:40am Welcome and Introduction Oral: Best Paper Candidates Invited Industry Talks: Video and Social Media
09:00am 1st International Workshop on
Multimedia Analysis and Retrieval
for Multimodal Interaction (MARMI) p1
Keynote Talk: Professor Shih-Fu Chang
10:00am Coffee Break Coffee Break
10:20am Oral 1: Deep Learning and Applications Coffee Break Invited Industry Talks: Visual Recognition API and Services
10:40am Coffee Break Special Oral Session
11:00am 1st International Workshop on
Multimedia Analysis and Retrieval
for Multimodal Interaction (MARMI) p2
12:00pm Lunch (Mudd Carleton Lounge) Lunch (Mudd Carleton Lounge) Lunch (Mudd Carleton Lounge)
01:00pm Lunch (Schapiro CEPSR Ground Floor) Oral 2: Image and Video Content Analysis Oral 4: Image and Video Search Industry Panel
02:00pm Tutorial 1 p1 Tutorial 2 p1 Conference Close
02:40pm Coffee Break Coffee Break ACM Multimedia 2016 TPC Workshop
03:00pm Oral: Brave New Ideas Posters and Demos
(optional setup, if needed)
Student Symposium
03:40pm Coffee Break
04:00pm Tutorial 1 p2 Tutorial 2 p2 Posters and Demos
04:40pm Coffee Break
05:00pm Oral 3: Multimedia Datasets and Applications
06:00pm Welcome Reception
Avery Plaza (click for map)
Social Dinner @ Calle Ocho (click for directions)

KEYNOTE: New Frontiers of Large Scale Multimedia Information Retrieval

Tuesday June 7, 9:00am @ International Affairs - Altschul Auditorium

  • Shih-Fu Chang


    Shih-Fu Chang

    Columbia University
    Shih-Fu Chang is the Sr. Executive Vice Dean and the Richard Dicker Professor of Columbia Engineering. His research is focused on multimedia information retrieval, computer vision, machine learning, and signal procesing, with the goal to turn unstructured multimedia data into searchable information. His work on content based visual search in the early 90's, VisualSEEk and VideoQ, set the foundation of this vibrant area. Over the years, he continued to create innovative techniques for image/video recognition, multimodal analysis, visual information ontology, image authentication, and compact hashing for large-scale image databases. For his long-term pioneering contributions, he has been awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia SIG Technical Acheivement Award, Honorary Doctorate from the University of Amsterdam, the IEEE Kiyo Tomiyasu Award, and IBM Faculty Award. For his dedicated contributions to education, he received the Great Teacher Award from the Society of Columbia Graduates. He served as Chair of Columbia Electrical Engineering Department (2007-2010), the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), and advisor for several research institutions and companies. In his current capacity in Columbia Engineering, he plays a key role in the School's strategic planning, special research initiatives, international collaboration, and faculty development. He is a Fellow of the American Association for the Avancement of Science (AAAS) and IEEE.
  • Multimedia information retrieval aims to automatically extract useful information from large collection of images, videos, and combinations with other data like text and speech. As reported in recent news, it's now possible to search information over millions or more of products with just an example image on the mobile phone. Intelligent apps are being deployed by major companies to automatically generate keywords or even captions of an image at a sophistication level that could not be imagined before. In this talk, I will review core technologies involved and discuss challenges and opportunities ahead. First, to address the complexity bottleneck when scaling up the data size, I will present extremely compact hash codes and deep learning image classification models that can reduce complexity by orders of magnitude while preserving approximate accuracy. Second, to support easy extension of recognition systems to new domains, instead of relying on fixed image categories, we introduce a new paradigm to automatically discover unique multimodal concepts and structures using large amounts of multimedia data available. Last, to support emerging applications beyond basic image categorization, I will discuss ongoing efforts in uderstanding how images are used in expressing sentiments and emotions in online social media and how languages/cultures may influence such online multimedia communication.

ORAL SESSION 1: Deep Learning and Applications

Tuesday June 7, 10:20am @ International Affairs - Altschul Auditorium

Session Chair: Miriam Redi

45 Matching User Photos to Online Products with Robust Deep Features
Xi Wang, Zhenfeng Sun, Wenqiang Zhang and Yu-Gang Jiang (Fudan University)
72 Video Emotion Recognition with Transferred Deep Feature Encodings
Baohan Xu*, Yanwei Fu', Yu-Gang Jiang*, Boyang Li' and Leonid Sigal' (*Fudan university, 'Disney Research Pittsburgh)
137 Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features
Lorenzo Baraldi, Costantino Grana and Rita Cucchiara (University of Modena and Reggio Emilia)
151ACD: Action Concept Discovery from Image-Sentence Corpora
Jiyang Gao, Chen Sun and Ram Nevatia (University of Southern California)

ORAL SESSION 2: Image and Video Content Analysis

Tuesday June 7, 1:00pm @ International Affairs - Altschul Auditorium

Session Chair: Chua Tat Seng

8 GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring
Wenjing Ma*, Liangliang Cao' and Lei Yu* (*Institute of Software Chinese Academy of Sciences, 'Yahoo! Labs)
61 Mouse Activity as an Indicator of Interestingness in Video
Gloria Zen*, Yale Song', Paloma de Juan' and Alejandro Jaimes' (University of Trento*, ^Yahoo! Labs, 'AiCure)
132 Automatic Identification of Sports Video Highlights using Viewer Interest Features
Prithwi Raj Chakraborty', Ligang Zhang*, Dian Tjondronegoro' and Vinod Chandran' ('Queensland University of Technology, *Xi'an University of Technology)
148Diverse Concept-Level Features for Multi-Object Classification
Youssef Tamaazousti*, Hervé Le Borgne* and Céline Hudelot' (*CEA List, 'Centrale-Supélec)

ORAL SESSION: Brave New Ideas

Tuesday June 7, 3:00pm @ International Affairs - Altschul Auditorium

Session Chair: Hui Wu

47 Personalized Privacy-aware Image Classification
Eleftherios Spyromitros-Xioufis*, Symeon Papadopoulos*, Adrian Popescu' and Yiannis Kompatsiaris* (*CERTH ITI, 'CEA LIST)
222 The science and detection of tilting
Xingjie Wei (University of Cambridge), Jussi Palomaki (Newcastle University) and Jeff Yan (University of Lancaster)
224 Using Photos as Micro-Reports of Events
Siripen Pongpaichet, Mengfan Tang, Laleh Jalali and Ramesh Jain (University of California, Irvine)
227Searching for Audio by Sketching Mental Images of Sound – A Brave New Idea for Audio Retrieval in Creative Music Production
Peter Knees (Johannes Kepler University) and Kristina Andersen (STEIM)

ORAL SESSION 3: Multimedia Datasets and Applications

Tuesday June 7, 5:00pm @ International Affairs - Altschul Auditorium

Session Chair: Shin'ichi Satoh

56 The LFM-1b Dataset for Music Retrieval and Recommendation
Markus Schedl (Johannes Kepler University Linz)
101 Foreground Object Sensing for Saliency Detection
Hengliang Zhu*, Jiao Jiang*, Xiao Lin', Yangyang Hao* and Lizhuang Ma (*Shanghai Jiao Tong University, 'Shanghai Normal University)
102 Constrained Local Enhancement of Semantic Features by Content-Based Sparsity
Youssef Tamaazousti, Hervé Le Borgne and Adrian Popescu (CEA List)
167 Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts
Yi-Jie Lu*, Hao Zhang*, Maaike de Boer' and Chong-Wah Ngo* (*City University of Hong Kong, 'TNO/Radboud University Nijmegen)

ORAL SESSION: Best Paper Candidates

Wednesday June 8, 8:40am @ International Affairs - Altschul Auditorium

Session Chairs: Susanne Boll and Winston Hsu

25 Homemade TS-Net for Automatic Face Recognition
Shilun Lin, Zhicheng Zhao and Fei Su (Beijing University of Posts and Telecommunications)
38 Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation
Qing Li*, Zhaofan Qiu*, Ting Yao', Tao Mei', Yong Rui' and Jiebo Luo^ (*University of Science and Technology of China, 'Microsoft Research, ^University of Rochester)
73 Pooling Objects for Recognizing Scenes without Examples
Svetlana Kordumova,Thomas Mensink and Cees G.M. Snoek (University of Amsterdam)
193 Multilingual Visual Sentiment Concept Matching
Nikolaos Pappas*, Miriam Redi', Mercan Topkara+, Brendan Jou^, Hongyi Liu^, Tao Chen^ and Shih-Fu Chang^ (*Idiap Research Institute, 'Yahoo! Labs London, +JW Player, ^Columbia University)

SPECIAL SESSION: Learning with Semantic Information for Large Scale Multimedia Understanding

Wednesday June 8, 10:40am @ International Affairs - Altschul Auditorium

Session Chair: Nicu Sebe

80 A Short Survey of Recent Advances in Graph Matching
Junchi Yan*, Xucheng Yin', Weiyao Lin^, Cheng Deng+, Hongyuan Zha- and Xiaokang Yang^ (*East China Normal University, 'University of Science and Technology Beijing, ^Shanghai Jiao Tong University, +Xidian University, -Georgia Institute of Technology)
107 The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection
Pascal Mettes, Dennis Koelma and Cees Snoek (University of Amsterdam)
198 Learning for Traffic State Estimation on Large Scale of Incomplete Data
Yiyang Yao (Information & Communication Branch, State Grid Zhejiang Company); Yingjie Xia (Zhejiang University); Zhenyu Shan (Hangzhou Normal University); Zhengguang Liu (National University of Singapore)

ORAL SESSION 4: Image and Video Search

Wednesday June 8, 1:00pm @ International Affairs - Altschul Auditorium

Session Chair: Liangliang Cao

12 Diverse Yet Efficient Retrieval using Locality Sensitive Hashing
Vidyadhar Rao', Prateek Jain* and C V Jawahar' ('IIIT Hyderabad, *Microsoft Research)
29 Correlation Autoencoder Hashing for Supervised Cross-Modal Se
Yue Cao, Mingsheng Long, Jianmin Wang and Han Zhu (Tsinghua University)
49 Regional Subspace Projection Coding for Image Retrieval
Mingmin Zhen, Wenmin Wang and Ronggang Wang (Peking University)
109 Scaling Group Testing Similarity Search
Ahmet Iscen', Laurent Amsaleg* and Teddy Furon' ('INRIA, *CNRS-IRISA)

ORAL SESSION: Student Symposium

Wednesday June 8, 3:00pm @ International Affairs - Altschul Auditorium

Session Chair: Yao Wang

221 Multimodal Analysis of User-Generated Content in Support of Social Media Applications
Rajiv Shah (National University of Singapore)
228 Multimodal Visual Pattern Mining with Convolutional Neural Network
Hongzhi Li (Columbia University)
229 Facial Landmark Detection and Tracking for Facial Behavior Analysis
Yue Wu (Rensselaer Polytechnic Institute)


Wednesday June 8, 3:00pm @ Mudd - Carleton Lounge

Session Chair: Mei-Chen Yeh

19 Vinereactor: Crowdsourced Spontaneous Facial Expression Data
Edward Kim and Shruthika Vangala (Villanova University)
23 Mirroring Facial Expressions: Evidence from Visual Analysis of Dyadic Interactions
Yuchi Huang and Saad Khan (Educational Testing Service)
42 Sequential Correspondence Hierarchical Dirichlet Processes for Video Data Analysis
Jianfei Xue and Koji Eguchi (Kobe University)
46 A Computational Approach to Finding Facial Patterns of a Babyface
Zi-Yi Ke and Mei-Chen Yeh (National Taiwan Normal University)
51 Video Description Generation Using Audio and Visual Cues
Qin Jin (Renmin University of China) and Junwei Liang (Carnegie Mellon University)
52 Contextual Media Retrieval Using Natural Language Queries
Sreyasi Nag Chowdhury, Mateusz Malinowski, Andreas Bulling and Mario Fritz (Max Planck Institute for Informatics)
60 Learning Music Embedding with Metadata for Context Aware Recommendation
Dongjing Wang', Shuiguang Deng', Xin Zhang* and Guandong Xu^ ('ZheJiang University, *Shandong University, ^University of Technology Sydney)
62 Region Trajectories for Video Semantic Concept Detection
Yuancheng Ye', Xuejian Rong*, Xiaodong Yang^ and Yingli Tian* ('The Graduate Center CUNY, *The City College CUNY, ^NVIDIA Research)
63 Audiovisual Summarization of Lectures and Meetings Using a Segment Similarity Graph
Chidansh Bhatt', Andrei Popescu-Belis* and Matthew Cooper' ('FX Palo Alto Laboratory Inc., *Idiap Research Institute)
68 Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection
Yun Wang and Florian Metze (Carnegie Mellon University)
76 Adding Chinese Captions to Images
Xirong Li', Weiyu Lan', Jianfeng Dong* and Hailong Liu^ ('Renmin University of China, *Zhejiang University, ^Tencent)
85 Emotion Recognition from EEG Signals Enhanced by User's Profile
Tanfang Chen, Shangfei Wang, Zhen Gao and Chongliang Wu (USTC)
88 Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
Zhang Shiqing*, Shiliang Zhang', Tiejun Huang' and Wen Gao' (*Taizhou University, 'Peking University)
92 Large-Scale E-Commerce Image Retrieval with Top-Weighted Convolutional Neural Networks
Shichao Zhao, Youjiang Xu and Yahong Han (Tianjin University)
94 Web Video Popularity Prediction using Sentiment and Content Visual Features
Giulia Fontanini, Marco Bertini and Alberto Del Bimbo (Universita degli Studi di Firenze)
111 Accurate Aggregation of Local Features by using K-sparse Autoencoder for 3D Model Retrieval
Takahiko Furuya and Ryutarou Ohbuchi (University of Yamanashi)
114 Image Annotation using Multi-scale Hypergraph Heat Diffusion Framework
Venkatesh N Murthy', Avinash Sharma*, Visesh Chari* and R. Manmatha' ('University of Massachusetts Amherst, *International Institute of Information Technology Hyderabad)
118 Discriminant Cross-modal Hashing
Xing Xu', Fumin Shen', Yang Yang' and Heng Tao Shen* ('University of Electronic Science and Technology of China, *The University of Queensland)
123 CNN-based Style Vector for Style Image Retrieval
Shin Matsuo and Keiji Yanai (University of Electro-Communications)
128 MVC: A Dataset for View-Invariant Clothing Retrieval and Attribute Prediction
Kuan-Hsien Liu, Ting-Yen Chen and Chu-Song Chen (Academia Sinica)
135 A Quality Adaptive Multimodal Affect Recognition System for User-Centric Multimedia Indexing
Rishabh Gupta', Mojtaba Khomami Abadi^*, Fabio Morreale*, Jesus Cardenes Cabre^, Tiago H. Falk' and Nicu Sebe* ('INRS-EMT University of Quebec, ^SensAura Tech, *University of Trento)
140 Rank Diffusion for Context-Based Image Retrieval
Daniel Carlos Guimarães Pedronette (State University of São Paulo UNESP) and Ricardo Da Silva Torres (University of Campinas UNICAMP)
153 Bags of Local Convolutional Features for Scalable Instance Search
Eva Mohedano*, Amaia Salvador', Kevin McGuinnes*, Xavier Giro-I-Nieto', Noel O'Connor* and Ferran Marques' (*Insight Center for Data Analytics, 'Universitat Politecnica de Catalunya)
158 Interactive Multimodal Learning on 100 Million Images
Jan Zahálka', Stevan Rudinac', Björn Þór Jónsson*, Dennis C. Koelma' and Marcel Worring' ('University of Amsterdam, *Reykjavik University)
162 Combining Holistic and Part-based Deep Representations for Computational Painting Categorization
Rao Muhammad Anwer', Fahad Shahbaz Khan^, Joost van de Weijer* and Jorma Laaksonen' ('Aalto University, ^Linkoping University, *CVC Barcelona)
173 Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
Vedran Vukotic'*, Christian Raymond'* and Guillaume Gravier'^ (INRIA/IRISA Rennes', *INSA Rennes, ^CNRS)
176SSD Technology Enables Dynamic Maintenance of Persistent High-Dimensional Indexes
Bjorn Thor Jonsson (Reykjavik University), Laurent Amsaleg (CNRS-IRISA) and Herwig Lejsek (Videntifier Technologies)
177 Item-Based Video Recommendation: an Hybrid Approach considering Human Factors
Andrea Ferracani, Daniele Pezzatini, Marco Bertini and Alberto Del Bimbo (Universita degli Studi di Firenze)
178 Human’s Scene Sketch Understanding
Yuxiang Ye', Yijuan Lu' and Hao Jiang* ('Texas State University, *Boston College)
182Retrieval of Multimedia objects by Fusing Multiple Modalities
Ilias Gialampoukidis, Anastasia Moumtzidou, Theodora Tsikrika, Stefanos Vrochidis and Yiannis Kompatsiaris (CERTH - ITI)
189 Incremental Learning for Fine-Grained Image Recognition
Liangliang Cao', Jenhao Hsiao', Paloma de Juan', Yuncheng Li* and Bart Thomee' ('Yahoo Labs, *University of Rochester)
194 Spatially Localized Visual Dictionary Learning
Valentin Leveau*, Alexis Joly', Olivier Buisson^ and Patrick Valduriez' (*French National Institute of Audiovisual Contents, 'INRIA, ^INA)
201 Semantic Binary Codes
Sravanthi Bondugula and Larry Davis (University of Maryland College Park)
209 On the Effects of Spam Filtering and Incremental Learning for Web-Supervised Visual Concept Classification
Matthias Springstein (TIB Hannover) and Ralph Ewerth (TIB, Leibniz Universität Hannover)
220 Semi-supervised Identification of Rarely Appearing Persons in Video by Correcting Weak Labels
Eric Müller'^, Christian Otto' and Ralph Ewerth^* ('Ernst-Abbe-Hochschule Jena, ^TIB Hannover *Leibniz Universität Hannover)
230 Introducing Concept And Syntax Transition Networks for Image Captioning
Philipp Blandfort', Tushar Karayil', Damian Borth* and Andreas Dengel*('University of Kaiserslautern, *German Institute of Artificial Intelligence-DFKI)


Wednesday June 8, 3:00pm @ Mudd - Carleton Lounge

Session Chair: Bart Thomee

32 SentiCart: Cartography and Geo-contextualization for Multilingual Visual Sentiment
Brendan Jou, Margaret Yuying Qian and Shih-Fu Chang (Columbia University)
74 Personalized Retrieval and Browsing of Classical Music and Supporting Multimedia Material
Marko Tkalcic*, Markus Schedl^, Cynthia Liem' and Mark Melenhorst' (*Free University of Bolzano, ^Johannes Kepler University, 'TU Delft)
90 The Social Picture
Sebastiano Battiato', Giovanni Maria Farinella', Filippo Luigi Maria Milotta', Alessandro Ortis', Luca Addesso*, Antonino Casella*, Valeria D'Amico* and Giovanni Torrisi* ('University of Catania, *Telecom Italia)
108 Watching What and How Politicians Discuss Various Topics - A Large-Scale Video Analytics UI
Emily Song, Joseph Ellis, Hongzhi Li and Shih-Fu Chang (Columbia University)
142 Multimodal Event Detection and Summarization in Large Scale Image Collections
Manos Schinas', Symeon Papadopoulos', Georgios Petkos', Yiannis Kompatsiaris' and Pericles Mitkas* ('CERTH-ITI, *Aristotle University of Thessaloniki)
143 Object-aware Deep Network for Commodity Image Retrieval
Zhiwei Fang', Jin Liu', Yuhang Wang', Yong Li', Jinhui Tang*, Hanqing Lu and Hang Song ('Institute of Automation Chinese Academy of Sciences, *Nanjing University of Science and Technoligy)
152 An Automated End-To-End Pipeline for Fine-Grained Video Annotation using Deep Neural Networks
Baptist Vandersmissen, Lucas Sterckx, Thomas Demeester, Azarakhsh Jalalvand and Wesley De Neve (Ghent University - iMinds - Data Science Lab); Rik Van de Walle (Ghent University - iMinds - Data Science Lab)
175 Serendipity-driven Celebrity Video Hyperlinking
Shu-Jun Yang', Lei Pang', Chong-Wah Ngo' and Benoit Huet* ('City University of Hong Kong, *EURECOM)
207 Complura: Exploring and Leveraging a Large-scale Multilingual Visual Sentiment Ontology
Hongyi Liu', Brendan Jou', Tao Chen', Mercan Topkara+, Nikolaos Pappas^, Miriam Redi* and Shih-Fu Chang' ('Columbia University, +JW Player, ^Idiap Research Institute and EPFL, *Yahoo! Labs)


Thursday June 9, 8:40am @ International Affairs - Altschul Auditorium

Rogerio Feris

INVITED INDUSTRY TALKS: Video and Social Media

Thursday June 9, 8:50am @ International Affairs - Altschul Auditorium

Session Chair: Paul Natsev

click on picture for abstract

  • Tomas Izo

    Teaching Machines to Watch and Listen: Video Content Analysis at YouTube and Google

    Tomas Izo - 08:50am


    I will give an overview of some of the work done by the Video Content Analysis team at Google Research in the context of YouTube and Google Photos. I will show examples of features and use cases enabled or assisted by image and video analysis, and discuss in more detail how we have approached the problem of extracting meaning out of audio-visual signals at massive scales. I'll conclude with a note on data sets, and suggestions of open problem areas with large potential for impact.

    Tomáš Ižo leads the Video Content Analysis team in the Machine Perception group at Google Research. The team’s mission is to improve Google products like YouTube and Photos by making them more content-aware via machine learning and perception techniques. Tomáš came to Google by way of MIT, where he received a Ph.D. in computer science in 2007. His research focused on motion and scene analysis. At Google, he has contributed to many areas of video technology, from summarization, enhancement and creative tools to categorization, annotation and infrastructure for media processing.

  • Soumith Chintala

    Distributed Deep Learning at Scale Distributed Deep Learning at Scale Distributed Deep Learning at Scale

    Soumith Chintala - 09:25am

    Facebook AI Research

    This talk provides a brief overview of deep learning research, the challenges involved in scaling it up across multi-GPU and multi-machine clusters, while providing software that is flexible enough for research settings. We discuss the clear trends that are emerging in deep learning from a HPC perspective and discuss several examples from our work at Facebook AI Research.

    Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. He holds a Masters in CS from NYU.

INVITED INDUSTRY TALKS: Visual Recognition API and Services

Thursday June 9, 10:20am @ International Affairs - Altschul Auditorium

Session Chair: Rogerio Feris

click on picture for abstract

  • Matthew Zeiler

    Clarifai Neural Networks Clarifai Neural Networks

    Matthew Zeiler - 10:20am


    This talk will cover a broad range of topics from how successful computer vision systems are built using neural networks, why there is an explosion in real world applications powered by these systems, to how Clarifai makes it easy for you to build a new generation of intelligent applications. We will discuss the algorithms that were created in the 1980s and have evolved to power real world applications across every industry. Typically thought of as black box algorithms we will dive into a visualization technique which demonstrates what these models see at their various levels of abstraction. Then multiple applications enabled by this technology in Clarifai's API will be demonstrated, including a free photo organization product for consumers called Forevery, developer projects and enterprise solutions.

    Matthew Zeiler is an expert in the field of neural networks and Founder and CEO of Clarifai. After having learned from pioneers of neural networks including Geoff Hinton and Yann LeCun he started Clarifai in November 2013 upon completion of his PhD from New York University. He set out with the mission for Clarifai to understand every image and video to improve life; bringing the power of AI to everyone.

  • Jin Li

    Microsoft Cognitive Service : Building Intelligent and Engaging Applications

    Jin Li - 10:55am

    Microsoft Research

    Microsoft Cognitive Services is a collection of cloud APIs that are available to developers to make their applications more intelligent and engaging. In this talk, we will examine the Microsoft Cognitive Service, with particular attention to the celebrity API, which identify the celebrities in the image, and the image caption API, which provides a natural language description of the content of the image. We will describe the research work that powers the services, and the related challenges and the quests to expand the coverage and improve the quality of the services.

    Dr. Jin Li is a Partner Research Manager of the Cloud Computing and Storage group at MSR Technologies. His team has made great contributions to Microsoft in the order of hundreds of millions dollars per annum. His contributions include the local reconstruction code (LRC) in Azure and Windows Server, the erasure code used in Lync, Xbox and RemoteFX, the Data Deduplication feature in Windows Server 2012, the high performance SSD based key-value store in Bing, and the RemoteFX for WAN feature in Windows 8 and Windows Server 2012. He has won a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. He has served as the lead Program Chair of ICME 2011, ICME Steering Committee Chair and a Program Co-Chair of ACM Multimedia 2016. He is an IEEE Fellow.

  • Rahul Singhal

    Watson Vision - Disrupting Businesses Watson Vision - Disrupting Businesses

    Rahul Singhal - 11:30am

    IBM Watson

    Watson vision is a set of APIs that are harnessing the power of sophisticated deep learning to help businesses use visual data to drive new businesses. We will illustrate how companies have created signifcant shareholder value by use of these APIs will small engineering teams

    Rahul Singhal is a Watson Product Leader for its suite of Image, Speech and Text products. He has over 20 years of experience in the software industry. He frequently speaks at variety of conferences and is a sought after expert on AI and Machine Learning. In his spare time, he is found spending time with startups helping and learning from them

INDUSTRY PANEL: How is Deep Learning Changing the Multimedia World?

Thursday June 9, 1:00pm @ International Affairs - Altschul Auditorium

Session Chairs: Alejandro Jaimes and Vincent Oria

Tomas Izo (Google)

Soumith Chintala (Facebook)

Matthew Zeiler (Clarifai)

Jin Li (Microsoft)

Rahul Singhal (IBM)

SHARE LinkedIn Weibo