You are on page 1of 13

Que & Ans in statistics

P -Value in Statistics? Answer:The p value (also sometimes called alpha or probability level) is the percentage chance that the results of a statistical test are due to random error. So a p value of .01, for example, would mean that there is a 1% chance that the results are an error. p is also the cut-off where a test is considered statistically significant. In social sciences, significance is assumed (and therefore the hypothesis supported) when p < .05. Other fields have different cut-offs, and there are times when a researcher may argue for a higher or lower p value. It is important to note that p is only relevant to statistical significance. It has no bearing on the size or practical importance of results. As a common criticism of statistics is that a p value of .049 is "significant" while .051 is not, most researchers rely as much if not more on measures of effect size and practical application than on statistical significance. However, p remains the accepted convention for hypothesis testing. What does mu mean in statistics? Answer:Usually mu is the symbol for the mean of a probability distribution. It is sometimes used as the average of a dataset (also called the mean of the dataset), although I prefer to use "x bar". 1. What is difference between Bayesian and Frequentist? Bayesians condition on the data actually observed and consider the probability distribution on the hypotheses; Frequentists condition on a hypothesis of choice and consider the probability distribution on the data, whether observed or not. 2. What is likelihood? The probability of some observed outcomes given a set of parameter values is regarded as the likelihood of the set of parameter values given the observed outcomes. 3. What is p-value and give an example? In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. If the p-value is less than 0.05 or 0.01, corresponding respectively to a 5% or 1% chance of rejecting the null hypothesis when it is true (Type I error). Example: Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips * null hypothesis (H0): fair coin; * observation O: 14 heads out of 20 flips; and * p-value of observation O given H0 = Prob( 14 heads or 14 tails) = 0.115. The calculated p-value exceeds 0.05, so the observation is consistent with the null hypothesis that the observed result of 14 heads out of 20 flips can be ascribed to

chance alone as it falls within the range of what would happen 95% of the time were this in fact the case. In our example, we fail to reject the null hypothesis at the 5% level. Although the coin did not fall evenly, the deviation from expected outcome is small enough to be reported as being "not statistically significant at the 5% level". 4. What is sampling? How many sampling methods? Sampling is that part of statistical practice concerned with the selection of an unbiased or random subset of individual observations within a population of individuals intended to yield some knowledge about the population of concern. There are four sampling methods: Simple Random (purely random), Systematic( every kth member of population), cluster (population divided into groups or clusters) and stratified (divided by exclusive groups or strata, sample from each group) samplings. 5. What is the possibility to win lottery game 649? Pick 6 numbers out of 49 possible. The number of 6-number combination from a pool of of 49 numbers are: 49!/[(49-6)!6!]=13,983,816 .e We only have one of 14 million chance. 6. what is mode, mean, median, skewness, quartile, variance and standard deviation? Give a simple code in Matlab. 1) Mode: The mode of a data sample is the element that occurs most often in the collection. x=[1 2 3 3 3 4 4] mode(x) % return 3, happen most 2)Median: a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one median(x) % return 3 3) mean Total sum divided by number of values mean(x) % return 2.8571 4) quartile first quartile (25th percentile) second quartile (50th percentile) third quartile (75th percentile) kth percentile prctile(x, 25) % 25th percentile, return 2.25 prctile(x, 50) % 50th percentile, return 3, i.e. median 5) skewness Skewness is a measure of the asymmetry of the data around the sample mean. If

skewness is negative, the data are spread out more to the left of the mean than to the right. If skewness is positive, the data are spread out more to the right. skewness(x) % return-0.5954 6)kurtosis Kurtosis is a measure of how outlier-prone a distribution is. kurtosis(x) % return2.3594 7) Standard deviation the square root of an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples. std(x) %return 1.0690 8) Variance describes how far values lie from the mean var(x) %return 1.1429 9) moment a quantitative measure of the shape of a set of points. moment(x, 2); %return second moment 10). covariance measure of how much two variables change together y2=[1 3 4 5 6 7 8] cov(x,y2) %return 2*2 matrix, diagonal represents variance 11).Linear Regression: modeling the relationship between a scalar variable y and one or more variables denoted X. In linear regression, models of the unknown parameters are estimated from the data using linear functions. polyfit( x,y2,1) %return 2.1667 -1.3333, i.e 2.1667x-1.3333 12). One sample t-test A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. [h,p,ci] = ttest(y2,0)% return 1 0.0018 ci =2.6280 7.0863 This time the test rejects the null hypothesis at the default = 0.05 significance level. The p value has fallen below = 0.05 and the 95% confidence interval on the mean does not contain 0. Because the p value of the sample y is greater than 0.01, the test will fail to reject the null hypothesis when the significance level is lowered to = 0.01:

7. Hypothesis test
1) Null hypothesis: The null hypothesis (denote by H0 ) is a statement about the value of a population parameter (such as mean), and it must contain the condition of equality and must be written with the symbol =, , or .

2). Alternative hypothesis The Alternative hypothesis (denoted by H1 ) is the statement that must be true if the null hypothesis is false. 3) Conclusion a) Fail to reject the null hypothesis b) Reject the null hypothesis. 4) Significance level The probability of rejecting the null hypothesis when it is called the significance level , and very common choices are = 0.05 and = 0.01 5) Example: A sample of data for body temperature (n=106, x=98.20, s =0.62), for a 0.05 significance level, test the claim that the mean body temperature of health adults is equal to 98.6F. Solution: We have: H0 : =98.6 (original claim) H1 : 98.6 As n>30, the central limit theorem indicates that the distribution of sample means can be approximated by a normal distribution z= (x - )/ (/n) = (98.20-98.6)/(0.62/106) = -6.64 The test is two-sided because a sample mean significantly less than or greater than 98.6 is strong evidence against the null hypothesis that =98.6. /2 = 0.025, the left critical z value = -1.96, right z critical value = 1.96. For z between -1.96 and 1.96, we fail to reject H0. As z = -6.64, we reject the reject H0 i.e. we reject the hypothesis that the mean body temperature of health adults is equal to 98.6F. 8. What is Central Limit Theorem? As the sample size increases, the sampling distribution of sample means approaches a normal distribution If all possible random samples of size n are selected from a population with mean and standard deviation , the mean of the sample means is denoted by x , so x = the standard deviation of the sample means is: x = n Example: Given that the population of men has normally distributed weights, with a mean of 173 lb and a standard deviation of 30 lb, find the probability that a. if 1 man is randomly selected, his weight is greater than 180 lb.

b. if 36 different men are randomly selected, their mean weight is greater that 180 lb. Solution: a) z = (x - )/ = (180-173)/30 = 0.23 For normal distribution P(Z>0.23) = 0.4090 b) x = /n = 20/ 36 = 5 z= (180-173)/5 = 1.40 P(Z>1.4) = 0.0808 9. Describe Binomial Probability Formula? P(x)= p x q n-x n!/[(n-x)!x!] where x = number of trials x = number of successes among n trials p = probability of success in any one trial q = 1 -p Example: Find the probability of getting 3 left-handed students in a class of 15 students, given that 10% of us are left-handed. Solution: here n = 15, x=3, p =0.1, q =0.9 p(3) = 0.129, the probability is 12.9% What is the importance of statistics in nursing? Statistics have a set of applications in every fields of study as it is the science of data collection, analysis and presentation. Fore example, in nursing, it may be used to calculate the average number of people examined per day, week , month or year. In addition to determine the time interval in which a patient should be given a medicine, to see what percentage of the patients are HIV carriers among those who got bed in the hospitals etc. In the aforementioned cases, we can use simple descriptive statistical techniques such as mean, mode and percentage. Furthermore, econometric statistical techniques may be applied to see the cause-effect relationships and severity of diseases using the regression and correlation concepts. What are the Key terms used in statistics? Null hypothesis Interpretation of statistical information can often involve the development of a null hypothesis in that the assumption is that whatever is proposed as a cause has no effect on the variable being measured. The best illustration for a novice is the predicament encountered by a jury trial. The null hypothesis, H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is maintained unless H1 is supported by evidence"beyond a reasonable doubt". However,"failure to reject H0" in this case does not imply innocence, but merely that the evidence was insufficient to convict. So the jury does not necessarily accept H0 but fails to reject H0. While one can not "prove" a null hypothesis one can test

how close it is to being true with a power test, which tests for type II errors. Working from a null hypothesis two basic forms of error are recognized: * Type I errors where the null hypothesis is falsely rejected giving a "false positive". * Type II errors where the null hypothesis fails to be rejected and an actual difference between populations is missed. Error also refers to the extent to which individual observations in a sample differ from a central value, such as the sample or population mean. Many statistical methods seek to minimize the mean-squared error, and these are called "methods of least squares." Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other important types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. Most studies will only sample part of a population and so the results are not fully representative of the whole population. Any estimates obtained from the sample only approximate the population value. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Often they are expressed as 95% confidence intervals. Formally, a 95% confidence interval for a value is a range where, if the sampling and analysis were repeated under the same conditions (yielding a different dataset), the interval would include the true (population) value 95% of the time. This does not imply that the probability that the true value is in the confidence interval is 95%. From the frequentist perspective, such a claim does not even make sense, as the true value is not a random variable. Either the true value is or is not within the given interval. However, it is true that, before any data are sampled and given a plan for how the confidence interval will be constructed, the probability is 95% that the yet-to-becalculated interval will cover the true value: at this point, the limits of the interval are yet-to-be-observed random variables. One approach that does yield an interval that can be interpreted as having a given probability of containing the true value is to use a credible interval from Bayesian statistics: this approach depends on a different way of interpreting what is meant by "probability", that is as a Bayesian probability. Statistics rarely give a simple Yes/No type answer to the question asked of them. Interpretation often comes down to the level of statistical significance applied to the numbers and often refer to the probability of a value accurately rejecting the null hypothesis (sometimes referred to as the p-value). Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms. For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small beneficial effect, such that the drug will be unlikely to help the patient in a noticeable way. If you have a negative z-score it will be below the mean. * True * False TRUE

What does a correlation coefficient mean? Correlation was once defined for me by one of my professors at UCLA as the "gotogetherness" of two sets of numbers. That definition has always made sense to me. The degree to which two sets of numbers go together can be calculated statistically as a correlation coefficient1. Correlation coefficients (often symbolized by r, or rxy) can turn out to be as high in a positive direction as +1.00 if the relationship between the two sets of numbers is perfect and in the same direction (as is the case in Table 1 below for Tests A and B). A correlation coefficient can also turn out to be as high in a negative direction as -1.00 if the relationship is perfect and in opposite directions (as is the case in Table 1 for Tests A and C, or B and C). A correlation coefficient can also turn out to be zero if no relationship at all exists between the two sets of numbers (as would be the case if two sets of numbers were, say, random numbers). Basically, this is what a correlation coefficient represents. However, this coverage of the topic has necessarily been brief because the focus here is on the coefficient of determination. [For much more on calculating and interpreting correlation coefficients, see Brown 1996, 1999, or any other good applied statistics or testing book.] What is standard deviation? Answer It is defined as the positive square root of the mean of the squared deviations from mean. The square of S.D is called variance. The standard deviation is used as a measure of the variance of a measurement within a group of objects. In essence, it is the average difference between the measurement of any one object and the mean measurement for the group. For example, if the average measured weight of brown bears is 140kg (265lbs) and the standard deviation of weights among brown bears is 5kg (11lbs), then any particular, individual brown bear is likely to weight between 135-145kg (254276lbs), and very likely to weight between 130-150kg (243-287lbs). It's impossible to know the weight of an individual bear just by looking at the mean weight for all bears, but the standard deviation tells you what range of weights the weight of an individual bear will fall in. What does a coefficient of determination mean? The coefficient of determination makes interpreting correlation coefficients easier. Notwithstanding its impressive name, calculating this coefficient is simple. The coefficient of determination is simply the squared value of the correlation coefficient. That is why the symbol for this statistic is rxy. The resulting coefficient of determination provides an estimate of the proportion of overlapping variance between two sets of numbers (i.e., the degree to which the two sets of numbers vary together). By simply moving the decimal point two places to the right, you can interpret a

coefficient of determination as the percentage of variance shared by the two sets of numbers. So, a coefficient of determination of .81 can be interpreted as a proportion, or as 81%. For example, if you have two sets of scores on Tests X and Y, and they correlate at .90, you could square that value to get the coefficient of determination of .81 and interpret that result as meaning that 81% of the variance in Test X is shared with Test Y, or for that matter, that 81% of the variance on Test Y is shared with Test X. By extension, you should recognize that you don't know what the remaining 19% (100% - 81% = 19%) on each test is related to. What are the different types of hypothesis? 1. Types of hypothesesa. Inductive is a generalization based on specific observations. b. Deductive is derived from theory and provides evidence that supports, expands, or contradicts the theory. c. Nondirectional - states that relation or difference between variables exists. d. Directional - states the expected direction of the relation or difference. e. Null - states that there is no significant relation or difference between variables. What are the scope of statistics? Some consider statistics to be a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data,[5] while others consider it a branch of mathematics[6] concerned with collecting and interpreting data. Because of its empirical roots and its focus on applications, statistics is usually considered to be a distinct mathematical science rather than a branch of mathematics.[7][8] Statisticians improve the quality of data with the design of experiments and survey sampling. Statistics also provides tools for prediction and forecasting using data and statistical models. Statistics is applicable to a wide variety of academic disciplines, including natural and social sciences, government, and business. Statistical consultants are available to provide help for organizations and companies without direct access to expertise relevant to their particular problems. Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. This is useful in research, when communicating the results of experiments. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and are then used to draw inferences about the process or population being studied; this is called inferential statistics. Inference is a vital element of scientific advance, since it provides a prediction (based in data) for where a theory logically leads. To prove the guiding theory further, these predictions are tested as well, as part of the scientific method. If the inference holds true, then the descriptive statistics of the new data increase the soundness of that hypothesis. Descriptive statistics and inferential statistics (a.k.a., predictive statistics) together comprise applied statistics. [9]

Statistics is closely related to probability theory, with which it is often grouped; the difference is roughly that in probability theory, one starts from the given parameters of a total population to deduce probabilities pertaining to samples, but statistical inference moves in the opposite direction, inductive inference from samples to the parameters of a larger or total population. What are the Statistical methods? Experimental and observational studies A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on dependent variables or response. There are two major types of causal statistical studies: experimental studies and observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types lies in how the study is actually conducted. Each can be very effective. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead, data are gathered and correlations between predictors and response are investigated. [edit] Experiments The basic steps of a statistical experiment are: 1. Planning the research, including finding the number of replicates of the study, using the following information: preliminary estimates regarding the size of treatment effects, alternative hypotheses, and the estimated experimental variability. Consideration of the selection of experimental subjects and the ethics of research is necessary. Statisticians recommend that experiments compare (at least) one new treatment with a standard treatment or control, to allow an unbiased estimate of the difference in treatment effects. 2. Design of experiments, using blocking to reduce the influence of confounding variables, and randomized assignment of treatments to subjects to allow unbiased estimates of treatment effects and experimental error. At this stage, the experimenters and statisticians write the experimental protocol that shall guide the performance of the experiment and that specifies the primary analysis of the experimental data. 3. Performing the experiment following the experimental protocol and analyzing the data following the experimental protocol. 4. Further examining the data set in secondary analyses, to suggest new hypotheses for future study. 5. Documenting and presenting the results of the study. Experiments on human behavior have special concerns. The famous Hawthorne study examined changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in determining whether increased

illumination would increase the productivity of the assembly line workers. The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected productivity. It turned out that productivity indeed improved (under the experimental conditions). However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a control group and blindness. The Hawthorne effect refers to finding that an outcome (in this case, worker productivity) changed due to observation itself. Those in the Hawthorne study became more productive not because the lighting was changed but because they were being observed.[citation needed] [edit] Observational study An example of an observational study is one that explores the correlation between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a case-control study, and then look for the number of cases of lung cancer in each group. [edit] Levels of measurement Main article: levels of measurement There are four main levels of measurement used in statistics: nominal, ordinal, interval, and ratio. Each of these have different degrees of usefulness in statistical research. Ratio measurements have both a meaningful zero value and the distances between different measurements defined; they provide the greatest flexibility in statistical methods that can be used for analyzing the data.[citation needed] Interval measurements have meaningful distances between measurements defined, but the zero value is arbitrary (as in the case with longitude and temperature measurements in Celsius or Fahrenheit). Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values. Nominal measurements have no meaningful rank order among values. Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are grouped together as categorical variables, whereas ratio and interval measurements are grouped together as quantitative variables, which can be either discrete or continuous, due to their numerical nature What rule or principle is frequently expressed in statistical symbols? Here are some commonly used symbols in statistics. is used to denote the population mean and is used for the population standard deviation. Some other very common and important ones are x (with a bar on top) which is used for the sample mean and s which is used for sample standard deviation. What is the importance of statistics in the field of engineering?

1. Design of Experiments (DOE) uses statistical techniques to test and construct models of engineering components and systems. 2. Quality control and process control use statistics as a tool to manage conformance to specifications of manufacturing processes and their products. 3. Time and methods engineering uses statistics to study repetitive operations in manufacturing in order to set standards and find optimum (in some sense) manufacturing procedures. 4. Reliability engineering uses statistics to measures the ability of a system to perform for its intended function (and time) and has tools for improving performance. 5. Probabilistic design uses statistics in the use of probability in product and system design. To calculate confidence intervals we need make use of: 1. Sampling distributions. 2. Histograms. 3. z-scores. 4. None of the above. 1. Sampling distributions In another study you have a standard deviation of 12, a mean of 20 and a sample size of 50. What is the standard error? 1. 0.089 2. 0.069 3. 1.7 4. 0.589 3. 1.7 ;S.E(x bar)= S.D/(n)^1/2 so s.d=12, n=50 and mean 20 there fore S.E(x bar) = 1.7

There is a low (but real) negative correlation between the amount of rain in a given summer and the amount the summer before. In the absence of any information except that this summer is wetter than usual, you are asked to guess next summer's rain. Your best guess a. somewhat more than the average summer rainfall b. the average summer rainfall c. somewhat less than the average summer rainfall d. the standard deviation of the rainfall c. somewhat less than the average summer rainfall Since the correlation is negative, more rain this year means less next. The correlation is weak (small), so the regression line has a low slope; that is, it won't be very different from the mean. What is an example of negative correlation?

In a negative correlation, as the values of one of the variables increase, the values of the second variable decrease. Likewise, as the value of one of the variables decreases, the value of the other variable increases. This is still a correlation. It is like an "inverse" correlation. The word "negative" is a label that shows the direction of the correlation. There is a negative correlation between TV viewing and class grades-students who spend more time watching TV tend to have lower grades (or phrased as students with higher grades tend to spend less time watching TV). Here are some other examples of negative correlations: 1. Education and years in jail-people who have more years of education tend to have fewer years in jail (or phrased as people with more years in jail tend to have fewer years of education) 2. Crying and being held-among babies, those who are held more tend to cry less (or phrased as babies who are held less tend to cry more) We can also plot the grades and TV viewing data, shown in the table below. The scatterplot below shows the sample data from the table. The line on the scatterplot shows what a negative correlation looks like. Any negative correlation will have a line with that direction. What is the null hypothesis in an experiment would be? Changing the independent variable has no significant effect on the dependent variable That is, every group (experimental or control or whatever) is a sample of the same population, even though the groups are differentiated by the independent variable. Suppose the mean on the final exam is 24 (of 40), with a standard deviation of 1.5. If you get a 21, how well do you do (relative to the rest of the class)? very poorly--perhaps the lowest score That's 2 standard deviations below the mean (z = -2.0). The fraction in the lower tail is 0.0228 only 2 1/4% did worse! (Assume it's roughly a normal distribution....) In a class of 100, the mean on a certain exam was 50, the standard deviation, 0. What does this means? everyone had a score of exactly 50 A zero standard deviation means all scores are the same, and equal to the mean (what else could the mean be?) The standard error has been calculated as 2.6 and the sample mean is 10.00. Thus the 95% confidence interval lies between: 1. 4.904 to 15.096 2. 7.40 to 12.60 3. 3.85 to 26 4. There is not enough information available to work out the confidence interval. 1.4.904 to 15.096

Inferential statistics deal with: 1. Making conclusions and generalizations about population/s from our sample data. 2. The tabulation and organization of data in order to demonstrate their main characteristics. 3. Giving the best estimate of the population mean. 4. Both the second and third statement. 1. Making conclusions and generalizations about population/s from our sample data.

You might also like