You are on page 1of 10

Program & Bibliographie

- 3(1,2): ~5 theory (301, B2) +10 practice (Comp. Chem. Lab by gro up)

- Website: www2.hcmut.edu.vn/~dzung / (available from Sep 15)

TIN HỌC TRONG CNTP - R: www.r-


www.r-project.org

Nguyễn Hoà
Nguyễ Hoàng Dũng,
ng, PhD.
Trường Đại học Bách khoa Tp.
Trườ Tp. HCM

NHDzung – Lesson 1, slide 2

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ?
How-
How-to

Conduct a research

1. Sampling

2. Measurement

3. Collect data *

4. Analysis and present your results

* Sensory practices

NHDzung – Lesson 1, slide 3 NHDzung – Lesson 1, slide 4

1-1. Samples and Populations Simple Random Sample


A population consists of the set of all
Sampling from the population is often done
measurements in which the investigator is randomly,
randomly, such that every possible sample of
interested. equal size (n) will have an equal chance of being
A sample is a subset of the measurements selected.
selected from the population.
A census is a complete enumeration of every A sample selected in this way is called a simple
random sample or just a random sample.
sample.
item in a population.

NHDzung – Lesson 1, slide 5 NHDzung – Lesson 1, slide 6

1
Samples and Populations Problem
Foreign and Vietnamese Cheeses : Quality and Preference ?
How-
How-to

Conduct a research

1. Sampling

2. Measurement

3. Collect data *

4. Analysis and present your results

* Sensory practices
Population (N) Sample (n)
NHDzung – Lesson 1, slide 7 NHDzung – Lesson 1, slide 8

Measurements The criteria of “science”


•The assigning of numbers to the values of
a variable (SS Stevens, Science 1946;103:677 -80) Science Pseudoscience
Logic, experimental evidence Belief, loyalty

•Rules specify procedures to assign Results are repeatable Results are not repeatable

numbers to values Falsiability*


Falsiability* Not falsifiable

Peer-
Peer-reviewed journals Not in peer reviewed journals

Evolution / learn from mistakes Constant, unchanged belief

*capable of being tested (verified or falsified) by experiment o r observation

NHDzung – Lesson 1, slide 9 NHDzung – Lesson 1, slide 10

Criteria of measurements Accuracy vs reliability (precision)


Validity measures what it purports to

Accuracy - the degree of “truthfulness”


truthfulness” of an attribute that is
being measured.

Reliability (consistency and repeatability)

Sensitivity to important variation precision

accuracy

Measurement error decreases the accuracy of measurement


NHDzung – Lesson 1, slide 11 NHDzung – Lesson 1, slide 12

2
Some important concepts: Data - Variables -
Scales THÔNG TIN CHUNG
Qualitative - Categorical - Quantitative - Measurable or 1.1 Mô tả ngườ
người trả
trả lời phỏ
phỏng vấn
1.1.1 Giới tính của người được phỏng vấn?1
n?1. Nam 2. Nữ
Frequency or Nominal: Countable: Tình trạng hôn nhân:
nhân: 1. Độc thân 2. Có gia đình

1.1.2 Tuổi của người được phỏng vấn?


Examples are- Examples are-
are- Dưới 25 tuổ
tuổi
are- Dướ
tuổi
25 – 30 tuổ
• Color • Temperatures tuổi
31 – 54 tuổ
tuổi
>55 tuổ
• Gender • Humidity
Ông/Bà cho biết nghề nghiệp hiện nay ?
1.1.3 Xin Ông/Bà
• Nationality • Gross compounds Học sinh,
sinh, sinh viên
• Preference points Bác sĩ/giá
/giáo viên
nhân/ lao động làm thuê/bá
Công nhân/ thuê/bán hàng
scored on a 100 point Hưu trí
trí

Ông/Bà cho biết thu nhập của gia đình Ông/Bà


1.1.4 Ông/Bà Ông/Bà ở mức nào sau đây
Thấp ( ≥ 2 triệ
1 . Thấ triệu đồng và < 5 triệ
triệu)
triệu và <8 triệ
2 . Trung bình (≥ 5 triệ triệu)
triệu)
3 . Cao ( ≥ 8 triệ
NHDzung – Lesson 1, slide 13 NHDzung – Lesson 1, slide 14

Some important concepts: Data - Variables -


Scales
•8 phomat
phomat (EdamF
(EdamF,, EdamH,
EdamH, GoudaH,
GoudaH, m1, m2, m3, m4, Variable Measurement scales
m5)
m5) • Discrete variables • Nominal scales ? (Label)
người thử
•11 ngườ thử (chuyên gia) • Continuous variables • Ordinal scales (Ranks in
• Independent variables Army)
lần lặ
•3 lầ lặp lạ
lại • Inteval scales (Celsius,
• Dependent variables
Fahrenheit)
thuật ngữ
•15 thuậ ngữ mô tả: sour bitterness umami salty greasiness
• Ration scales (true zero
butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full point, ratio)

flavor yellow hard


điểm không cấ
•Thang điể cấu trú từ 0-100 mm
trúc từ

NHDzung – Lesson 1, slide 15 NHDzung – Lesson 1, slide 16

Types of measurement Qualitative measurements


Nominal level Ordinal level
Qualitative
Qualitative Quantitative
Quantitative • Classification • Classification + Ordering
(định
(địnhchất)
chất) (định
(địnhlượng)
lượng)
• A set of objects can be • A set of numbers can be
classified into exhaustive, assigned rank values and
mutually exclusive and nothing more.
Nominal Interval unique symbol • Ex: socio-economic status,
• Ex: religion, sex, location, education, levels of
Ordinal Ratio etc satisfaction, etc

NHDzung – Lesson 1, slide 17 NHDzung – Lesson 1, slide 18

3
Quantitative measurements Problem
Foreign and Vietnamese Cheeses : Quality and Preference ?
How-
How-to
Interval level Ratio level
• Classification + Ordering + • Classification + Ordering + Conduct a research

Standard distance Standard distance + 1. Sampling


• A set of objects can be Natural zero
2. Measurement
described by units that • Quantitative variable with
indicate how far one case natural zero 3. Collect data *
is from another case • Ex: income, age, weight,
4. Analysis and present your results
• Ex: temperature bone mineral density
* Sensory practices

NHDzung – Lesson 1, slide 19 NHDzung – Lesson 1, slide 20

Ông/Bà cho biết loại pho mát cứng nào mà Ông/Bà


1.2.2. Ông/Bà Ông/Bà thường sử dụng
1.2.7. Theo Ông/Bà phó mát cứ ng ăn v ới sản phẩm nào?
Ông/Bà phó
Cheddar Bánh mì
Gouda Bánh sandwich
Edam Salad
Emental Bánh biscuit
Khá
Khác (ghi rõ)
rõ)……………………..
…………………….. Rượu vang
Rượ
Ông/Bà cho biết mức độ ưa thí
1.2.4. Ông/Bà thích chung đối với sản phẩm phó
phó mát Khá
Khác (ghi rõ tên)
tên)………………………………
bán cứng
1 2 3 4 5 6 7 8 9 1.2.8. Khi chọn mua sản phẩm phó
phó mát cứ ng, Ông/Bà cho biết mức độ quan tâm đối với
ng, Ông/Bà
Ông/Bà cho biết tần số sử dụng sản phẩm phó
1.2.5. Xin Ông/Bà phó mát bán cứng.
ng. những y ếu tố sau đây (1=r
(1=rất không quan tâm,
tâm, 2=không
2= không quan tâm, 3=không ý kiến,
tâm, 3=không
4=quan
4=quan tâm, 5=rất quan tâm)
tâm, 5=r tâm)
> 3 lần/tuầ
n/tuần Giá cả
Giá 1 2 3 4 5
1 – 2 lần/tuầ
n/tuần chất cảm quan của sản phẩ
Tính chấ phẩm 1 2 3 4 5
1-3 lần/thá
n/tháng Mức độ quen thuộ
thuộc 1 2 3 4 5
Ông/Bà cho biết lượng phó
1.2.6. Xin Ông/Bà phó mát bán cứng sử dụng trong tuần của Thuận lợi khi sử dụng
Thuậ 1 2 3 4 5
Ông/Bà
Ông/Bà Có lợi cho sức khoẻ
khoẻ 1 2 3 4 5
< 100g Khối lượ
Khố lượng sản phẩ
phẩm 1 2 3 4 5
100 – 300g
> 300g

NHDzung – Lesson 1, slide 21 NHDzung – Lesson 1, slide 22

•8 phomat
phomat (EdamF
(EdamF,, EdamH,
EdamH, GoudaH,
GoudaH, m1, m2, m3, m4,
m5)
m5)
người thử
•11 ngườ thử (chuyên gia)
lần lặ
•3 lầ lặp lạ
lại
thuật ngữ
•15 thuậ ngữ mô tả: sour bitterness umami salty greasiness
butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full
flavor yellow hard
điểm không cấ
•Thang điể cấu trú từ 0-100 mm
trúc từ

NHDzung – Lesson 1, slide 23 NHDzung – Lesson 1, slide 24

4
Summary Measures Population Parameters Sample
Statistics
judge session product sour bitterness umami salty
S1 1 m1 50 18 0 40 Measures of Central Measures of Variability
Tendency
S2 1 m1 100 65 40 100 • Range
• Median
• Variance
S3 1 m1 32 11 35 4
S4 1 m1 30 10 25 1
• Mode
S5 1 m1 60 23 30 29 • Standard
S6 1 m1 30 35 25 50 • Mean Deviation
S7 1 m1 50 32 45 64
S8 1 m1 32 23 40 40
S9 1 m1 78 27 45 21
l Other summary
measures:
S10 1 m1 55 30 34 18
S11 1 m1 62 21 43 32
– Skewness
– Kurtosis
NHDzung – Lesson 1, slide 25 NHDzung – Lesson 1, slide 26

1-3. Measures of Central Tendency or Location Arithmetic Mean or Average

• Median â Middle value when The mean of a set of observations is their average - the
sorted in order of sum of the observed values divided by the number of
magnitude observations.
â 50th percentile
Population Mean Sample Mean
• Mode â Most frequently-
N n
occurring value ∑x ∑x
µ= i =1
x= i =1

N n
• Mean â Average

NHDzung – Lesson 1, slide 27 NHDzung – Lesson 1, slide 28

Arithmetic Mean or Average Median


Robust parameter of central tendency
Non affected by outliers
Affected by outliers

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
Means = 5 Means = 6

NHDzung – Lesson 1, slide 29 NHDzung – Lesson 1, slide 30

5
Mode Measures of Central Tendency or Location

1 n
x1 + x 2 + K + x n
Ø Mean : x =
n
∑x
i =1
i =
n
1 k
n1 x1 + n2 x 2 + K + nk x k
x =
n
∑nx
i =1
i i =
n
Sample size
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Ø Median : med ( x ) = x ( p + 1) si n = 2p + 1
Mode = 9 Without Mode
x ( p ) + x ( p + 1)
= si n = 2p
2
NHDzung – Lesson 1, slide 31 NHDzung – Lesson 1, slide 32

Mean or Median ? Quartiles


The value of the boundary at the 25th, 50th, or 75th percentiles of a frequency
distribution divided into four parts, each containing a quarter of the population

Ÿ Outliers : median
25% 25% 25% 25%
Ÿ Many of « ex aequo » (variable discrete) : mean
( Q1 ) ( Q2 ) ( Q3 )
Position of ith quartile i ( n + 1)
( Qi ) =
4
1 ( 9 + 1) (12 + 13 ) = 12.5
Position of Q1 =
Position = 2.5 Q1 =
4 2

Data classified in increasing order : 11 12 13 16 16 17 18 21 22


NHDzung – Lesson 1, slide 33 NHDzung – Lesson 1, slide 34

1-4. Measures of Variability or Dispersion Dispersion


Range
• Difference between maximum and Ø Range : Range ( x ) = x( n ) − x (1)
minimum values
Variance
Range = 12 - 7 = 5 Range = 12 - 7 = 5
• Mean* squared deviation from the mean
Standard Deviation
• Square root of the variance 7 8 9 10 11 12 7 8 9 10 11 12

q0.75 − q0.25
Ø Intervalle interquartile :
∗ Definitions of population variance and sample variance differ slightly .
NHDzung – Lesson 1, slide 35 NHDzung – Lesson 1, slide 36

6
Mean (average) Variation
Given a series of values xi (i = 1, … , n):
n): x1, x2, …, xn, the The mean does not adequately describe the data.
mean is: 1 n We need to know the variation in the data.
x=
n
∑ xi
i =1
An obvious measure is the sum of difference from
Study 1:
1: the color scores of 6 consumers are: 6, 7, 8, 4, 5, and 6. The the mean:
mean is:
n 1 6 + 7 + 8 + 4 + 5 + 6 36 For study 1, the scores 6, 7, 8, 4, 5, and 6, we have:
x= ∑ xi = = =6
n i =1 6 6 (6-
(6-6) + (7-
(7-6) + (8-
(8-6) + (4-
(4-6) + (5-
(5-6) + (6-
(6-6)
=0+1+2–2–1+0
Study 2:
2: the color scores of 4 consumers are: 10, 2, 3, and 9. The
=0
mean is: 1 n 10 + 2 + 3 + 9 24
x= ∑ xi = = =6
n i =1 4 4 NOT SATISFACTORY!
NHDzung – Lesson 1, slide 37 NHDzung – Lesson 1, slide 38

Sum of squares Variance


We need to make the difference positive by squaring We have to divide the SS by sample size n. But in each square
them. This is called “Sum of squares”
squares” (SS) we use the mean to calculate the square, so we lose 1 degree of
freedom. Therefore the correct denominator is n-1. This is
For study 1: 6, 7, 8, 4, 5, 6, we have:
called variance (denoted by s2)
(6-6)2 + (7-
SS = (6- (8-6)2 + (4-
(7-6)2 + (8- (6-6)2 =
(5-6)2 + (6-
(4-6)2 + (5-
10
s2 =
(x1 − x )2 + (x 2 − x )2 + ... + (x n − x )2
For study 2: 10, 2, 3, 9, we have: n −1
SS= (10- (2-6)2 + (3-
(10-6)2 + (2- (9-6)2 = 50
(3-6)2 + (9-
Or, in the sum notation:
1 n
This is better! s2 = ∑ ( xi − x )
2
But it does not take into account sample size n. n − 1 i =1

NHDzung – Lesson 1, slide 39 NHDzung – Lesson 1, slide 40

1-5. Variance and Standard Deviation Variance - example


Population Variance Sample Variance For study 1: 6, 7, 8, 4, 5, and 6, the variance is:
n
(6 − 6 )2 + (7 − 6 )2 + (8 − 6 )2 + (5 − 6 )2 + (6 − 6 )2 10
N
∑ (x − x) s2 = = =2
2

∑ (x − µ)2 s =
2 i =1 6 −1 5
σ2 = i =1

N (n − 1)
For study 2: 10, 2, 3, 9, the variance is:
( x) ( )
2 2
N n
∑x
N
∑ n
i =1 (10 − 6 )2 + (2 − 6 )2 + (3 − 6 )2 + (9 − 6 )2 50
∑x − i =1
∑x − s2 = = = 16 .7
2 2

4 −1 3
N = n
i =1
= i =1

N (n − 1)
σ= σ
2
The scores in study 2 were much more variable
s= s
2

than those in study 1.


NHDzung – Lesson 1, slide 41 NHDzung – Lesson 1, slide 42

7
Standard deviation Standard Deviation
The problem with variance is that it is expressed in unit
squared,
squared, whereas the mean is in the actual unit. We Data A Mean = 15.5
need a way to convert variance back to the actual unit of s = 3.338
measurement. 11 12 13 14 15 16 17 18 19 20 21

Data B
We take the square root of variance – this is called Mean = 15.5
“standard deviation”
deviation” (denote by s) 11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
For study 1, s = sqrt(2) = 1.41 Mean = 15.5
For study 2, s = sqrt(16.7) = 4.1 11 12 13 14 15 16 17 18 19 20 21 s = 4.57

NHDzung – Lesson 1, slide 43 NHDzung – Lesson 1, slide 44

1-6 Form indicators: Skewness & Kurtosis Skewness


Skewness
Skewed to left
• Measure of asymmetry of a frequency distribution
• Skewed to left Mean < median < mode
• Symmetric or unskewed 3 0

• Skewed to right
Kurtosis 2 0
F re q ue nc y

• Measure of flatness or peakedness of a frequency


distribution
1 0
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
0

1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
x

NHDzung – Lesson 1, slide 45 NHDzung – Lesson 1, slide 46

Kurtosis Kurtosis

Platykurtic - flat distribution Mesokurtic - not too flat and not too peaked
7 0 0
5 0 0

6 0 0

4 0 0
5 0 0
F re q u e n c y

F re q u e n c y

4 0 0 3 0 0

3 0 0
2 0 0

2 0 0
1 0 0
1 0 0

0 0

- 3 .5 - 2 .7 - 1 .9 - 1 .1 - 0 .3 0 .5 1 .3 2 .1 2 .9 3 .7 -4 -3 -2 -1 0 1 2 3 4

X X

NHDzung – Lesson 1, slide 47 NHDzung – Lesson 1, slide 48

8
Diagram Quantitative variable

NHDzung – Lesson 1, slide 49 NHDzung – Lesson 1, slide 50

Quantitative variable Quantitative variable : boxplot x


x

Plus grande valeur inférieure à


If we want to see in detail: 21 q 0.75 +1.5(q 0.75 - q 0.25)
freq. between 1.65 m & 1.70
m distribute in 8 in [1.65 ; q 0.75
1.675] & 13 in [1.675 ; 1.70]
Median
q 0.25

Plus petite valeur supérieure à


q 0.25 -1.5(q 0.75 - q 0.25)
?
x
Boîte à moustaches

NHDzung – Lesson 1, slide 51 NHDzung – Lesson 1, slide 52

Form indicators Principes of good « figure »


γ1 < 0 γ1 > 0
Biểu diễ
§Biể diễn kết quả
quả phứ
phức tạp một cách rõ ràng,
ng, chí
chính xác và
Asymetry Symetry Asymetry hiệu quả
hiệ quả

§Trì nhiều ý tưở


Trình bày nhiề tưởng một cách hiệ
hiệu quả
quả nhấ
nhất

§Không nói dối !


Q1 Q 2 Q3 Q1 Q 2Q3 Q1 Q2 Q3

NHDzung – Lesson 1, slide 53 NHDzung – Lesson 1, slide 54

9
A BAD figure A GOOD figure
Fig. Digestion interactions of coral

e ) ns Figure 3. Digestion interactions for coral taxa sampled at


da ea )
ri (M e ac (B
po s da s ae ge
s
Pioneer Bay, Orpheus Island
o i te si y on i te a e i id n
cr or us lc or lg av po 60 Wins
A P M A P A F S

Frequency
50
120 Losses
110 40
100
90 Wins Losses 30
80
Freq.

70
60 20
50
40 10
30
20 0
10
0
) s
id
ae
(M ae an B) ae ae es
or sid ce es
( lg
vi
id ng
es us na
A o
op rit M rit Fa Sp
cr o yo Po
A P lc
A
Taxon

NHDzung – Lesson 1, slide 55 NHDzung – Lesson 1, slide 56

10

You might also like