Before starting any type of analysis classify the data set as either continuous or attribute, and even it is a combination of both types. Continuous information is characterized by variables that can be measured on a continuous scale including time, temperature, strength, or monetary value. A test is to divide the value in two and discover if it still is sensible.
Attribute, or discrete, data can be associated with defined grouping and after that counted. Examples are classifications of good and bad, location, vendors’ materials, product or process types, and scales of satisfaction like poor, fair, good, and ideal. Once a specific thing is classified it can be counted as well as the frequency of occurrence can be determined.
Another determination to make is whether or not the info is 统计学代写. Output variables tend to be called the CTQs (important to quality characteristics) or performance measures. Input variables are what drive the resultant outcomes. We generally characterize an item, process, or service delivery outcome (the Y) by some purpose of the input variables X1,X2,X3,… Xn. The Y’s are driven by the X’s.
The Y outcomes can be either continuous or discrete data. Types of continuous Y’s are cycle time, cost, and productivity. Types of discrete Y’s are delivery performance (late or on time), invoice accuracy (accurate, not accurate), and application errors (wrong address, misspelled name, missing age, etc.).
The X inputs can additionally be either continuous or discrete. Examples of continuous X’s are temperature, pressure, speed, and volume. Samples of discrete X’s are process (intake, examination, treatment, and discharge), product type (A, B, C, and D), and vendor material (A, B, C, and D).
Another set of X inputs to always consider would be the stratification factors. These are generally variables that may influence the item, process, or service delivery performance and must not be overlooked. If we capture these details during data collection we can study it to determine if it makes a difference or not. Examples are duration of day, day of every week, month of the year, season, location, region, or shift.
Since the inputs can be sorted from your outputs as well as the data can be considered either continuous or discrete your selection of the statistical tool to utilize boils down to answering the question, “What is it that we want to know?” This is a list of common questions and we’ll address every one separately.
What is the baseline performance? Did the adjustments designed to the procedure, product, or service delivery change lives? Are there relationships involving the multiple input X’s and the output Y’s? If there are relationships do they produce a significant difference? That’s enough inquiries to be statistically dangerous so let’s start by tackling them one at a time.
Precisely what is baseline performance? Continuous Data – Plot the information in a time based sequence utilizing an X-MR (individuals and moving range control charts) or subgroup the info using an Xbar-R (averages and range control charts). The centerline of the chart provides an estimate from the average from the data overtime, thus establishing the baseline. The MR or R charts provide estimates from the variation over time and establish top of the and lower 3 standard deviation control limits for your X or Xbar charts. Produce a Histogram of the data to look at a graphic representation from the distribution from the data, test it for normality (p-value should be much in excess of .05), and compare it to specifications to gauge capability.
Minitab Statistical Software Tools are Variables Control Charts, Histograms, Graphical Summary, Normality Test, and Capability Study between and within.
Discrete Data. Plot the data in a time based sequence employing a P Chart (percent defective chart), C Chart (count of defects chart), nP Chart (Sample n times percent defective chart), or perhaps a U Chart (defectives per unit chart). The centerline provides the baseline average performance. Top of the and lower control limits estimate 3 standard deviations of performance above and beneath the average, which accounts for 99.73% of expected activity as time passes. You will have a quote of the worst and finest case scenarios before any improvements are administered. Produce a Pareto Chart to see a distribution in the categories as well as their frequencies of occurrence. In the event the control charts exhibit only normal natural patterns of variation over time (only common cause variation, no special causes) the centerline, or average value, establishes the capacity.
Minitab Statistical Software Tools are Attributes Control Charts and Pareto Analysis. Did the adjustments made to the process, product, or service delivery make a difference?
Discrete X – Continuous Y – To evaluate if two group averages (5W-30 vs. Synthetic Oil) impact gas mileage, use a T-Test. If there are potential environmental concerns that may influence the exam results make use of a Paired T-Test. Plot the results on the Boxplot and assess the T statistics with all the p-values to produce a decision (p-values under or similar to .05 signify that a difference exists with a minimum of a 95% confidence that it is true). If there is a change pick the group using the best overall average to satisfy the goal.
To evaluate if 2 or more group averages (5W-30, 5W-40, 10W-30, 10W-40, or Synthetic) impact fuel useage use ANOVA (analysis of variance). Randomize the order in the testing to reduce at any time dependent environmental influences on the test results. Plot the results over a Boxplot or Histogram and measure the F statistics with all the p-values to produce a decision (p-values less than or equal to .05 signify that the difference exists with a minimum of a 95% confidence that it is true). When there is a change choose the group with all the best overall average to satisfy the objective.
In either of the above cases to test to see if there is a difference in the variation caused by the inputs because they impact the output utilize a Test for Equal Variances (homogeneity of variance). Use the p-values to create a decision (p-values less than or similar to .05 signify that a difference exists with a minimum of a 95% confidence that it is true). If there is a positive change choose the group with the lowest standard deviation.
Minitab Statistical Software Tools are 2 Sample T-Test, Paired T-Test, ANOVA, and Test for Equal Variances, Boxplot, Histogram, and Graphical Summary. Continuous X – Continuous Y – Plot the input X versus the output Y using a Scatter Plot or if there are multiple input X variables make use of a Matrix Plot. The plot supplies a graphical representation from the relationship involving the variables. If it seems that a romantic relationship may exist, between one or more of the X input variables as well as the output Y variable, conduct a Linear Regression of one input X versus one output Y. Repeat as necessary for each X – Y relationship.
The Linear Regression Model gives an R2 statistic, an F statistic, and the p-value. To be significant to get a single X-Y relationship the R2 needs to be in excess of .36 (36% of the variation in the output Y is explained through the observed alterations in the input X), the F should be much more than 1, and the p-value ought to be .05 or less.
Minitab Statistical Software Tools are Scatter Plot, Matrix Plot, and Fitted Line Plot.
Discrete X – Discrete Y – In this sort of analysis categories, or groups, are when compared with other categories, or groups. For instance, “Which cruise line had the highest customer care?” The discrete X variables are (RCI, Carnival, and Princess Cruise Companies). The discrete Y variables would be the frequency of responses from passengers on their own satisfaction surveys by category (poor, fair, good, great, and ideal) that connect with their vacation experience.
Conduct a cross tab table analysis, or Chi Square analysis, to judge if there was variations in amounts of satisfaction by passengers based upon the cruise line they vacationed on. Percentages are used for the evaluation as well as the Chi Square analysis provides a p-value to further quantify whether or not the differences are significant. The general p-value associated with the Chi Square analysis should be .05 or less. The variables that have the largest contribution to the Chi Square statistic drive the observed differences.
Minitab Statistical Software Tools are Table Analysis, Matrix Analysis, and Chi Square Analysis.
Continuous X – Discrete Y – Does the cost per gallon of fuel influence consumer satisfaction? The continuous X is definitely the cost per gallon of fuel. The discrete Y is definitely the consumer satisfaction rating (unhappy, indifferent, or happy). Plot the info using Dot Plots stratified on Y. The statistical technique is a Logistic Regression. Yet again the p-values are used to validate that the significant difference either exists, or it doesn’t. P-values which are .05 or less mean that people have at least a 95% confidence which a significant difference exists. Utilize the most frequently occurring ratings to help make your determination.
Minitab Statistical Software Tools are Dot Plots stratified on Y and Logistic Regression Analysis. Are there any relationships between the multiple input X’s and also the output Y’s? If there are relationships do they change lives?
Continuous X – Continuous Y – The graphical analysis is really a Matrix Scatter Plot where multiple input X’s can be evaluated up against the output Y characteristic. The statistical analysis method is multiple regression. Measure the scatter plots to search for relationships between the X input variables and the output Y. Also, search for multicolinearity where one input X variable is correlated with another input X variable. This is analogous to double dipping so we identify those conflicting inputs and systematically take them out through the model.
Multiple regression is actually a powerful tool, but requires proceeding with caution. Run the model with all of variables included then evaluate the T statistics and F statistics to identify the first set of insignificant variables to get rid of from your model. Through the second iteration of the regression model turn on the variance inflation factors, or VIFs, which are utilized to quantify potential multicolinearity issues 5 to 10 are issues). Review the Matrix Plot to identify X’s related to other X’s. Remove the variables with the high VIFs and the largest p-values, but ihtujy remove one of many related X variables within a questionable pair. Evaluate the remaining p-values and take off variables with large p-values through the model. Don’t be blown away if this process requires some more iterations.
When the multiple regression model is finalized all VIFs will be lower than 5 and all sorts of p-values is going to be less than .05. The R2 value ought to be 90% or greater. This can be a significant model and also the regression equation can certainly be employed for making predictions as long while we keep your input variables in the min and max range values which were employed to create the model.
Minitab Statistical Software Tools are Regression Analysis, Step Wise Regression Analysis, Scatter Plots, Matrix Plots, Fitted Line Plots, Graphical Summary, and Histograms.
Discrete X and Continuous X – Continuous Y
This example requires using designed experiments. Discrete and continuous X’s can be used as the input variables, nevertheless the settings for them are predetermined in the style of the experiment. The analysis technique is ANOVA that was previously mentioned.
The following is a good example. The objective would be to reduce the quantity of unpopped kernels of popping corn in a bag of popped pop corn (the output Y). Discrete X’s could be the type of popping corn, form of oil, and model of the popping vessel. Continuous X’s might be level of oil, amount of popping corn, cooking time, and cooking temperature. Specific settings for all the input X’s are selected and integrated into the statistical experiment.