You are on page 1of 38

EIE4105

Multimodal Human Computer


Interaction Technology
Introduction to HCI

M.W. Mak
enmwmak@polyu.edu.hk
http://www.eie.polyu.edu.hk/~mwmak

1
Contents
1. Definition of HCI
2. Types of HCI
3. Multimodal HCI
4. Applications of HCI
5. Future of HCI

2
What is HCI?
• Wiki:
“Human–computer interaction (HCI) researches the design and use
of computer technology, focusing particularly on the interfaces
between people (users) and computers. Researchers in the field of
HCI both observe the ways in which humans interact with computers
and design technologies that lets humans interact with computers in
novel ways.”
• Webopedia.com:
“A discipline concerned with the study, design, construction and
implementation of human-centric interactive computer systems. A
user interface, such as a GUI, is how a human interacts with a
computer, and HCI goes beyond designing screens and menus that
are easier to use.

3
What is HCI?

4
What is HCI?

5
History of HCI

6
History of HCI

OS X Yosemite Windows 10

2015
7
Today’s HCI

8
Types of HCI

• Graphical user interface (GUI)


• Speech
• Image and Video
• Brain computer interface
• Hand gesture
• Handwriting
• Augmented Reality
9
Graphical User Interface
• “GUI allows users to interact with electronic
devices through graphical icons and visual
indicators such as secondary notation, as
opposed to text-based interfaces, typed
command labels or text navigation.” (Wiki)
• It is now considered as a traditional type of
interface.
• Not the focus of this course.

10
Speech
• Use speech recognition and speech synthesis
technologies to interact with computers
• The computers are not only able to convert
speech to text and vice versa, but also able
to understand the meaning of the spoken
sentences.
• Speech recognition systems: convert speech
to text, e.g., Google voice search
• Spoken dialog systems: interact with
computers through speech, e.g. Apple Siri
11
Demo from Microsoft
http://www.youtube.com/watch?feature=player_embedded&v=Nu-nlQqFCKg

12
Demo from Naunce

https://www.youtube.com/watch?v=GQ3Glr5Ff28&list=PL8064BF519AAEDB8B&feature=bf_prev
13
Demo from Google
https://www.youtube.com/watch?v=oNc2f2BhZ50

https://www.youtube.com/watch?v=_ouYfFHvbC8

14
Detection of Obstructive Sleep Apnoea
>18kHz

http://www.washington.edu/news/2015/04/27/new-uw-app-can-
detect-sleep-apnea-events-via-smartphone/ 15
Image and Video
• This type of HCI uses the camera of a computer as
the input
• Face processing: face recognition, face detection,
face tracking, and facial expression recognition.
• Microsoft’s How old do I look (www.how-old.net)

16
Face Recognition and Detection
https://www.youtube.com/watch?v=mdhvRNYX0PI

17
https://www.youtube.com/watch?v=mnfWZvU_Jqo
Emotion Recognition
http://www.affectiva.com/solutions/affdex/

https://vimeo.com/79306627 18
Drowsiness Detection
https://www.youtube.com/watch?v=gvN1elQ8NLY

19
Brain Computer Interface
• Use brainwaves to communicate with computers
• E.g., Phylter screen out low-priority text and email
messages by sensing the activity of your brain.
• The device uses near-infrared spectroscopy to read
brain activity.

20
Brain Computer Interface

21
Mercedes-Benz Attention Assist

https://www.youtube.com/watch?v=weeM9FZlQig 22
Brainwaves for Robotic Control

http://spectrum.ieee.org/biomedical/bionics/how-to-catch-brain-waves-in-a-net 23
Gesture Recognition
• Gesture recognition is a topic in computer science and
language technology with the goal of interpreting human
gestures via mathematical algorithms.
• Current focuses include emotion recognition from face and
hand gesture recognition.
• Many products use stereographic video illuminated by near-
IR LEDs, camera, infrared projectors, and computer vision
algorithms to interpret sign language.
• Gesture recognition enables humans to communicate with
the machine (HMI) and interact naturally.
• This could potentially make conventional input devices such
as mouse, keyboards and even touch-screens redundant.
https://en.wikipedia.org/wiki/Gesture_recognition
24
Leap Motion for Gesture Recognition

https://www.leapmotion.com/

25
Kinect for Gesture Recognition
https://www.youtube.com/watch?v=MwZMNMmODJA

26
Multimodal HCI
• The increasing availability of multimedia data, broadband
access, voice over IP, and powerful mobile devices is
fostering a new wave of human–computer interfaces that
support multiple modalities.
• A modality is a natural way of interaction: speech, vision,
face expression, handwriting, gesture, etc.
• Some systems use the correlation between multiple
modality, e.g., audio-visual speech recognition, audio-visual
speaker detection.

27
Multimodal HCI
• Multimodal biometric systems use more than one modality
(face + voice) to enhance recognition accuracy. These
systems make use of the independency between the
modalities, e.g., there is no correlation between face and
fingerprint.
• Multimodality outputs, e.g., multimodal speech synthesis
involving speech and talking head.

28
Why Multimodal HCI?
• Natural: making use of more (appropriate) senses
• Flexible: different modality excel at different tasks
• Helps the visually/physically impaired
• Faster and more efficient
• Robust: mutual disambiguation of recognition
errors
• Multimodal interfaces are more attractive to users

29
Example of Multimodal HCI

• Pepper: https://www.aldebaran.com/en/a-robots/who-is-pepper
• Pepper is a companion able to communicate with
human through voice, touch and emotions.
30
Augmented Reality (AR)
• Augmented reality (AR) is a live direct or indirect
view of a physical, real-world environment whose
elements are augmented (or supplemented) by
computer-generated sensory input such as sound,
video, graphics or GPS data
• With the help of advanced AR technology (e.g.
adding computer vision and object recognition) the
information about the surrounding real world of
the user becomes interactive and digitally
manipulable.

31
Microsoft HoloLens
• Microsoft HoloLens is the first fully untethered,
see-through holographic computer.
• It enables high-definition holograms to come to life
in your world, seamlessly integrating with your
physical places, spaces, and things.

32
L’Oreal Paris Makeup Genius
• L'Oréal revolutionizes the makeup shopping
experience by allowing consumers to scan their
own features, browse a catalog and then virtually
apply makeup before making a purchase.
• Consumers can also scan product barcodes in-store
to test those items with the app.

33
Augmented Reality Apps

34
(Not so distant) Future of HCI

35
Beware of your shoulder!
(Not so distant) Future of HCI
https://www.youtube.com/watch?v=xiLNCT6lUsU

More future stuff:


http://www.futuretimeline.net/blog/ai-robots-blog.htm#.Vdg899Oqqko 36
Brief Outline of This Course
• Many of the examples shown earlier are based on
the following architecture
Actions
Digital Signals /Responses
Feature Feature Pattern
Extraction Vectors Classification

• E.g., in speech recognition, the digital signals will be


time-domain speech samples, the feature vectors
will be spectral vectors, and the responses will be
text.

37
Brief Outline of This Course
• This course will focus on the pattern classification
part and assumes that the feature vectors can be
easily obtained from software libraries.
• To understand pattern classification part of HCI, we
need to understanding machine learning.
• To understand machine learning, we need to
understand some fundamental math such as
probability and statistics.
• Many of the math have been learned in Yr 1 or
even high school. So, don’t worry.
• In most cases, just extend from 1-D to high-
dimension 38

You might also like