(Mt) – Offshore Manufacturing Discusssion

FIND A SOLUTION AT Academic Writers Bay

THE CREDIBILITY OF EVIDENCE Paul C. Boyd, PhD 2020 The Credibility of Evidence© Paul C. Boyd, PhD Keywords: Pyramid of Evidence, Causality, Concomitant Variation Management is a practice defined by decision making; the better the decision making, the better the institutional outcome. Yet, research (Nutt, 1999) suggests that more than fifty percent of business decisions fail to meet their objectives. Not a great record. Why should we believe them when someone makes a claim? How do they know it? What is their source of information? The term ‘evidence’ is used to denote actionable knowledge. And some evidence is more credible than other; meaning that it is stronger, that we have more reason to trust that it is ‘true.’ The process of establishing the credibility of evidence is a challenging, yet often-overlooked managerial responsibility. Decision effectiveness can be linked to the quality of the evidence used to inform the practice. And, while some evidence is trustworthy, much should be regarded with various levels of suspicion. The term used here to spectify the differences in the various levels of ‘quality’ is credibility. The identification of which sources of information constitute a ‘high’ level of credibility has not been an easy process and many managers have struggled with the determination as to which information should be trusted. Over the past few decades, a tool has evolved to assist practitioners in this process of evidence assessment. The general name for this instrument is The Pyramid of Evidence and it has been widely employed in in the ‘Evidence-Based Medicine’ movement. Broadening the pyramid past clinical trials is important in the practice of Evidence-based Management. Before we get to establishing the credibility of evidence, we should take a look at what evidence is, what it is used for, and how we acquire it. Evidence, Facts, Truths and Proof Reasoning is the process of reaching a conclusion and the basic process for reasoning is an argument; a set of reasons posited as evidence for a conclusion. The basic type of argument in business decisionmaking is ‘inductive reasoning.’ This process involves the attempt to draw a conclusion about a whole group (e.g., a company’s customers) by measuring some subset of the group (e.g., a sle of the customers) and then making the leap that if something is true for the subset; it is likely true for the entire group. Inductive reasoning is imprecise. While considered valid, at least in some political circles, to claim that there are different realities1, it is more helpful in business decision-making to understand that there may be differing perceptions, but a single reality. There’s a difference between “That’s not the way it is” and ‘That’s not the way I see it.” 1 “You’re saying it’s a falsehood, and they’re giving — our press secretary, Sean Spicer, gave alternative facts to that.” (Kelleyanne Conway, Meet the Press, 1/22/17). As reported by Rebecca Sinderbrand in the Washington Post in How Kellyanne Conway ushered in the era of ‘alternative facts’ 1 So, let’s establish that there is a single reality and that we have imperfect methods of perceiving it. And, just because we have imperfect methods for figuring it out, does not mean we shouldn’t try. Facts are accurate representations of the single reality. They can be very hard to establish. Generally, it is so hard to ‘prove’ a fact that we shouldn’t even try to make the claim. Rather, evidence supports a conclusion, it does not prove it. We seek evidence, and the stronger, the better. Causality #038; Influence Managerial decision making is most effective when it correctly ascribes ‘cause-and-effect’ to phenomena and, can, therefore make the required changes to achieve the desired effect. Would a new benefits package attract higher quality candidates? Is there a relationship between mobile phone handset brand ownership and clothing brand preference? Do companies whose CEOs have higher salaries overpay for business acquisitions? Actually establishing causality is not an easy practice. How can you know if one thing truly ‘causes’ another? That is, what does it take to make the claim that X influences Y? To be clear, cause-and-effect conversations are more about propensities or likelihoods than they are about absolutes; ‘probability of success’ rather than ‘if, and only if.’ So, rather than ascribing ‘cause,’ discussions tend to be about determining if one class of events (say, free continental breakfasts) influences another class of events (say, the number of hotel customers). There are le descriptions of the philosophical attribution of cause-and-effect through the concepts of the ‘necessary’ and ‘sufficient’ conditions (see, for exle, Brennan, 2017). However, another model suffices for executive decision making. Rather than get overly concerned about the actual requirements for ‘cause-and-effect’ decision makers are most concerned about whether or not they should conclude that one thing influences another (and what that influence ‘is’). Managers use the terms interchangeably. In order to make an assertion that X influences Y, four elements should be simultaneously present: 1) Good reason: There must be a theoretically sound reason as to ‘Why’ X influences Y. 2) Temporal order: X must occur before Y. 3) Absence of a Better ‘Cause’: ‘Intervening’ or background variables do not provide a more straightforward explanation of Y. 4) Concomitant variation: Different values/measures of X should generally correspond to different values of Y. 1) Good Reason Theory has been described as the ‘plausible’ explanation of cause-and effect (theory. BusinessDictionary.com). In this sense, plausible suggests two elements: a) that the explanation is “When you tell me that, you know, he should testify because he’s going to tell the truth and he shouldn’t worry, well that’s so silly because it’s somebody’s version of the truth. Not the truth.” (Rudy Giuliani , Meet the Press, 8/19/18). As reported by Rebecca Morin #038; David Cohen for Politico in Giuliani: ‘Truth isn’t Truth’ 2 logical and rational and 2) that explanation has not been discredited by credible research. Basically, a theory remains a theory up until the time it has been invalidated. Theories have relative strength. Stronger theories have been ‘scientifically’ tested numerous times and survived; weak theories not so much. Darwinian ‘evolution’ is a stronger theory than ‘trickledown economics;’ it has been tested and survived many times. Some claims bear the name of ‘theory,’ but if they have not been tested, they are more appropriately termed ‘speculation.’ If the speculation cannot be scientifically tested, it cannot (should not) claim the mantle of a theory. See, for exle, the Flying Spaghetti Monster ‘theory’ presented by the Pastafarians (Henderson, 2006). This claim is clearly based on speculation, as are the other so-called ‘theories’ (e.g., Intelligent Design) it seeks to parody. 2) Temporal Order While easily overlooked, this requirement is fairly important. Basically, the ‘cause’ must precede the ‘effect;’ that is, something cannot influence something in the past. Sometimes this can be difficult to establish. For exle, does an increase in crime influence an increase in gun ownership (for protection)? Or does an increase in gun ownership lead to more crime? It would depend, in part, on which went up first. 3) Absence of a Better Cause The possibility that there are other, typically more straight-forward, explanations of phenomena must not be overlooked. Philosophically, this process is described by Ockham’s Razor. Taking liberties with the translation, this means that simpler explanations (those requiring fewer assumptions) are more likely to lead to more accurate descriptions of cause-and-effect. From a theoretical/research perspective, this means that a concerted effort must be made to uncover two types of competing causes: 1) ‘Intervening’ variables and 2) background or environmental variables that are primary influences on both X, the supposed cause, and Y, the supposed effect. a) Intervening Variables Variables that (causally) exist between the theorized causing variable X and the caused variable Y are called Intervening Variables. A stated theory may be that X causes Y, but, if, in fact, X causes Z, and Z causes Y, then the original theory is invalid. In social and psychological fields intervening variables may be difficult to measure concepts such as states-of-mind, intelligence or aptitude. Take, for exle, the often-claimed theory that ‘education causes income;’ basically, that the higher one’s education (generally) the higher the salary. Yet this direct link is not quite accurate; there is an intervening variable (‘occupation’) which better describes ‘income.’ ‘Education’ can qualify someone for an occupation, and it is the occupation that commands higher salaries. Not, 3 Education Income But, Education Occupation Income (This exle was taken from Crossman, 2018.) b) Environmental Variables In some instances, relationships are presented as theories simply because, well, a relationship appears to exist. Yet these relationships mask the true causal mechanism. So, while the theory may be that X causes Y, in truth both X and Y are caused by Z, a third variable. These third variables are often called Lurking or Environmental Variables. For exle, historically there exists a statistical relationship between the number of board feet of lumber produced and the total weight of nails manufactured. This could lead to a theory that lumber production causes nail production. Yet, there is a third variable that explains both lumber production and nail manufacturing … population size. The size of the population influences (to some extent) the demand for both … and a host of other consumer products. Or the classic exle of ice cream sales causing pool drownings (highly statistically related), when the outside air temperature (summer/winter, etc.) is a more accurate explanation of both.2 So, rather than X causing Y, Z causes both X and Y. Not, Lumber Production but, Nail Production Population Size Lumber Production Nail Production There must be an exhaustive search for Intervening and Environmental variables to ensure that any attribution of cause-and-effect may be honestly asserted. 4) Concomitant Variation If one thing has an influence on another, then different values of the first should be reflected in different values of the second. If advertising impacts sales, then there should be notable differences 2 This helps explain one of the common logical fallacies: correlation ≠ causation. Just because two variables are correlated does not mean that one caused the other. 4 difference in sales when advertising is increased or decreased. The term for this phenomenon is called Concomitant Variation, but is more commonly called Correlation (which is one type of concomitant variation). Establishing whether or not concomitant variation exists is a challenging exercise that falls into two closely related fields: Research Methodology and Inferential Statistics. The determination that one thing changes in response to another (causality) is assessed on the methodological ability to establish that it is actually X that is causing Y and not something else. And this depends, to a large extent on the quality or validity of the methodology used to measure any change3. Understanding the differences in methodologies will help to explain the various levels of the credibility of evidence. As a reminder, all four of these elements of causality must be concurrently present in order for one to reach the conclusion that X causes Y. Logical fallacies abound when claims of causality rely on just one element. See, for exle: ‘Anecdotal Evidence,’ ‘Cum Hoc Ergo Propter Hoc,’ ‘Post Ho Ergo Propter Hoc,’ and ‘Ignoring a Common Cause’ (McCandless, 2012). Establishing Concomitant Variation The concomitant variation aspect is critical in establishing causality. And in order to draw any conclusion regarding concomitant variation, two variables must be measured; the independent (or influencing) variable and the dependent (or influenced) variable. The methods by which we can establish that X actually influences Y with confidence rely to a large extent on probability theory. The gist of statistical analysis is to establish (test) that any relationship uncovered in data is not an artifact of random probabilities. The various statistical methods for determining concomitant variation depend on the ‘levels of measurement’ of the causing (independent or X) and the caused (dependent or Y) variables. Basically, variables must be separated into two broad types (‘levels’) for the purpose of analysis: Categorical: Where the values that the variable can take are categories (Urban: Suburban: Rural or United: Delta: American). Categorical variables are statistically summarized through proportions (%s). Numeric: Where the values that the variable can take are quantitative (0: 1: 2: etc. children or the number of purchases /year). Numeric variables are statistically summarized through averages (means) and spread (standard deviations). The primary statistical methods for determining concomitant variation are: ➢ Chi-Square (2) when both variables are categorical. ➢ ANOVA (Analysis of Variance) when the Independent variable is categorical and the dependent variable is numeric. ➢ Logistical Regression when the independent variable is numeric and the dependent variable is categorical. 3 See, for exle, Shadish, Cook #038; Cbell (2001). 5 ➢ Correlation when both variables are numeric. Causality, as defined here, requires that different levels of the independent variable be established and measured. Methodologically, there are two ways this can happen: ➢ Correlational Studies: The values of the independent and dependent variables are measured at the same time, and then one of the above analyses above is conducted to determine if there is statistical evidence that different levels of the dependent variable tend to coincide with different levels of the independent variable. For exle, a survey may collect data on both educational attainment (as an independent variable) and retirement investment dollar size (the theoretical dependent variable). An ANOVA test would then be run to determine if, higher levels of retirement investments were associated with higher levels of educational attainment. ➢ Longitudinal Studies (Experiments): The value of the independent variable is manipulated by the researcher from one value to another, and the value of the dependent variable is measured for each value (of the independent variable). Then an analysis is conducted to determine whether or not differences exist in the dependent variable as the independent variable changes. For exle, an organizational researcher could change the bonus structure (the independent variable) in one division of a company and then measure whether or not employee turnover (the dependent variable) changed. Generally, Longitudinal studies are preferred over correlational ones. The advantages of longitudinal studies include: 1. Researchers have greater control over the values of the independent variables. 2. The likelihood of accidentally stumbling upon a ‘spurious’ relationship is diminished4. Spurious relationships are random artifacts in databases wherein the values of two variables in the data collected may be statistically correlated, but the relationship does not actually exist in the population.5 and, 3. The ‘validity’ of the research is enhanced. Validity refers to the assurance that the dependent variable measures the intended concept. Longitudinal studies come in a variety of forms. Some of these are more helpful in establishing causeand-effect than others as they ‘control’ more of the threats to validity (Shadish, et al, 2001). 4 5 See, for exle, Vigen, T. (2015) Spurious Correlations and tylervigen.com/spurious-correlations One method to check for such spurious relationship is, if the database is large enough (i.e., has a very large sle size), conscientious researchers ‘hold back’ some of the data from the initial analysis and, should a relationship be uncovered, follow-up by testing whether the relationship continues to exist in the held back cases. 6 The Pyramid of Evidence The hierarchy of evidence credibility – the ability to establish cause-and-effect – has been presented as the Pyramid of Evidence, characterized by lower quality evidence at the bottom and the most actionable, highest quality at the top. Much of the credit for these types of hierarchies can be traced to the Evidence-based Medicine movement. The typologies from those fields tend to focus, logically, on epidemiological studies and clinical trials (see, for exle, Petticrew #038; Roberts, 2003). With minor modifications, the Pyramid can be modified for management studies. In increasing order of credibility, the pyramid levels are: 5) Expert Opinion, 4) Descriptive Studies #038; Case Studies, 3) Correlational Studies, 2) Longitudinal Studies (in three strengths, 2c) One Sle Longitudinal Experiments, 2b) Controlled Longitudinal Studies, 2a) Randomized, Controlled Longitudinal Studies) and 1) Meta Analyses #038; Systematic Reviews. Understanding the differences in these levels is important when a decision-maker is attempting to figure out which evidence to trust when making causal claims. It should be noted here that there are claims that are posited as evidence sufficient for reaching a conclusion that do not rise to any acceptable standard of credibility for effective decision making: • • • ‘Anecdotal’ information: This information is generally developed from a non-random, very small and unrepresentative sle. This type of information may be useful for developing hypotheses, but should not be used to reach conclusions. ‘Instinct’: Generally, a manifestation of a heuristic (Tversky #038; Kahneman, 1973), likely the ‘Availability’ heuristic. While heuristics may be useful in repeated, routine decisions, they notoriously falter when applied to (even remotely) similar decisions.6 ‘Gut feelings’: Similar to ‘Instinct,’ but generally presented without foundation (any explanation of what was considered to reach a conclusion). Often baseless and self-serving. None of these constitute a credible reason to reach a conclusion about what constitutes a good course of action. Rather, ‘credibility’ relies on the ability to show, a priori, that a course of action has a good chance of being successful. 6 See, for exle, Wise, J. (2020), Trump says he’ll use ‘facts and instincts’ when deciding to push for US to reopen. 7 The Pyramid of Evidence7 1 5) META ANALYSES/SYSTEMATIC REVIEWS 2a RANDOMIZED, CONTROLLED LONGITUDINAL EXPERIMENTS 2b CONTROLLED LONGITUDINAL EXPERIMENTS 2c ONE SLE (UNCONTROLLED) LONGITUDINAL EXPERIMENTS 3 CORRELATIONAL STUDIES 4 DESCRIPTIVE STUDIES #038; CASE STUDIES 5 EXPERT OPINION Expert Opinion The advice of ‘experts’ is often used as a source of information in organizations. Consultants for virtually any organizational decision are available, often for a healthy fee. Experts sell their advice based on their greater expertise; a track record of success, specific subject-matter education, an ’objective’ perspective, and so on. In many instances this expertise is valid and based upon empirical research, in others, not-so-much. Since the motivations of ‘experts’ may be opaque, it is best to determine the source of their knowledge: How do they know what they claim? Because it is so difficult to establish what ‘expert knowledge’ actually is, and to what extent any consultant actually posses it, decision makers should seek information from higher levels of the pyramid. 7 Adapted from Barends, E. (2012), 6. Evidence based management: What is the best available evidence? (PowerPoint), Slide 31, (See Module 7: Best Available Evidence @ https://www.cebma.org/teaching-materials/), The Center for Evidence Based Management. 8 4) Descriptive Studies #038; Case Analyses Descriptive studies present a picture of a situation at one particular time. Case Studies present a significant amount of information about a single situation. ΟD 8 a) b) ΔI ΟD In either case, the measurement may be taken with (b) or without (a) a preceding change in another variable (i.e., an independent variable). For exle, an organization could conduct a study of its clients to try to get an idea of what percentage of them were repeat customers (a). Likewise, a company could change its healthcare insurance provider, then 6 months later, survey employees to see how satisfied they were with the insurance (b). Both of these exles are problematic in establishing cause-and-effect; the first (a) because it is impossible to determine why the proportion is what it is (and, therefore, no way to know how to change it), and the second (b) because, without a prior study, you can’t really tell if satisfaction went up, down or stayed the same as a result of the change in providers. Descriptive studies have been termed statistical snapshots since they typically collect data (sometimes from many variables) and then use simple inferential statistics to make a generalization about the nature of the entire population. Market segmentation studies, for exle, tend to be descriptive studies of which subsets of the population tend to purchase a particular product. Case studies tend to deeply analyze, often with data, aspects of a single phenomenon; such as a successful product launch or a corporate failure. Causal links may be posited, but are often hard to truly establish. And, as with anything based upon a sle size of exactly 1, it is impossible to make generalizations. Because so much information is presented in a detailed case study, it may be tempting to draw parallels between the situation described and another. Don’t base organizational decisions on evidence extracted or deduced from case studies. As mentioned above, establishing concomitant variation requires that different levels of a dependent variable coincide (generally) with different levels of the independent variable. As a reminder, there are two ways that the different levels can be established: 1) by observing different levels of the independent variable and dependent variable at the same time (Correlational Studies) or 2) by altering the independent variable to see if there is a coincident change in the dependent variable (Experiments). 8 For these diagrams: Δ = Change (… the value of) Ο = Observe (measure) I = the Independent variable D = the dependent variable This first diagram simply states that, at some point, the Dependent variable is measured. The first diagram shows a change in the independent variable followed by a measurement of the dependent variable. 9 3) Correlational Studies In Correlational Studies the data for both the independent and dependent variables are collected once, at the same time. Then, a statistical analysis is performed to see if different levels of the independent variable correspond generally with different levels of the dependent variable. A simple exle would be a survey which measured the number of years an employee had held their current job (an independent variable) and how ‘satisfied’ they reported they were with their job (the dependent variable). Then a statistical analysis would be performed to determine whether job satisfaction went up (or down) as employees spent more time in a particular job. 2) Longitudinal studies (experiments) The general class of longitudinal studies refers to a type of experiment wherein a variable (the dependent variable) is measured twice (or more), to see if there is a difference between the measures. These two measurements are called the Pretest (before the change in the independent variable) and -test (after the change). When the independent variable is changed between the pretest and -test, then it might be that the change in the independent variable influenced the change in the dependent variable. At the very least it could be evidence of the Concomitant Variation requirement for the establishment of cause-and-effect.9 2c) Single Sle (Uncontrolled) Longitudinal Studies The most basic longitudinal study involves measuring the dependent variable from a sle (to establish a baseline), then changing the independent variable, and then measuring the dependent variable a second time (to see if it had changed). It may be represented as: Pretest ΟD Manipulation ΔI Post test ΟD It might be tempting to then infer that any difference between test score and the Pretest score of the dependent variable could be attributed to the manipulation of the independent variable. Such a claim should be treated with some skepticism; we can’t tell for sure that it was actually the change in the independent variable that ‘caused’ the change in the dependent variable. It could easily be something else that changed (but wasn’t measured) that resulted in the change. For exle, it might be that it was a rise in air temperature in the summer that caused an increase in Iced Tea sales, rather than an Ad caign. There are a number of types of things that could impact the dependent variable, but were not measured. These factors are called Threats to Internal Validity (Shadish, Cook, #038; Cbell, 2001, p55.). 9 Beyond the scope of this paper are studies that use different sles (subjects, individuals) in the pre- and posttests. In general, using the same subjects if preferable, yielding more directly comparable results. 10 2b) Controlled Longitudinal Experiments One way to control at least some of the Threats to Internal Validity is to include a ‘control’ group in the study. A control group is a part of the sle (those measured) who do NOT experience a change in the independent variable. Those who DO experience a change in the independent variable are called the experimental group. Pretest Experimental Group Control Group Manipulation EΟD1 CΟD1 Post test ΔI EΟD2 CΟD2 In this type of study, we can see if the difference between the pretest and post-test scores for the two groups are the same (even if test measure is greater for both groups). If they are the same, it would be hard to make the claim that the change in the independent variable did anything to the dependent variable. For exle, in order to see if a flex time policy would improve employee morale, two different units in a company could be measured; say Accounts Payable as the experimental group and Accounts Receivable as the control group. The morale of each unit is initially measured (Pretest), then the policy is changed for the Accounts Payable unit (but not for Accounts Receivable). A year later both units are measured again (Post-test). If morale went up more (or down less) for the Accounts Payable group, then the change in policy was likely the cause. Still, some caution should be used before making a blanket claim here. There might be something fundamentally different in the two groups to begin with that could explain any difference. For exle, the Accounts Payable unit could be largely populated with younger people (with families, who might appreciate flex time) while the Accounts Receivable Unit was largely made up of empty-nesters (who would be less impacted by flex time). Shadish, et al (2001) would call this threat to internal validity Selection. 2a) Randomized, Controlled Longitudinal Studies In order to avoid Selection Bias as a Threat to Internal Validity above, the Experimental and Control groups should be identical in make-up (or, as similar as is possible). Without diving into probability theory, the best way to achieve this similarity is to randomly assign subjects (the sle) to the experimental and control groups and then conduct the experiment. This ensures that the ONLY difference between the two groups is the presence (or lack thereof) of the experimental treatment. Pretest Experimental Group (Random) EΟD1 Control Group (Random) CΟD1 Manipulation ΔI Post test EΟD2 CΟD2 Dissecting the table above: The difference between the two control group measures [CΟD2 CΟD1] represents and quantifies any change caused by unmeasured factors. The difference between the two experimental group measures [EΟD2 – EΟD1] quantifies the differences cause by both the unmeasured factors AND the change caused by the change in the 11 (measured) independent variable. Subtracting the change in the control group from the change in the experimental group [(EΟD2 – EΟD1) – (CΟD2 – CΟD1)] quantifies the unique contribution of the independent variable on the dependent variable. So, since the only difference between the two groups is the change in the independent variable, that change is the best (experimental) explanation for any post-test differences in the dependent variable for the two groups. Not only that, but the extent of the impact of the independent variable can be measured by noting any difference between the changes in the experimental and control groups. For a single study/experiment, Randomized, Controlled Longitudinal Studies are as good as it gets in establishing causality. 1) Meta Analyses and Systematic Reviews Scientists have come to realize over the years that, while somewhat compelling, any single study is open to a variety of possible constraints and biases (say, economic conditions or Industry) that might make broader application of the results problematic. They prefer to have more than one study to confirm the findings of the first. These types of ‘replication’ studies are not too common, but very important if we are to truly understand the extent to which we should trust a theory (see the earlier section on ‘Good Reason’). At the top of the Pyramid of Evidence are comprehensive reviews that attempt to combine the information from a number of studies into an across-the-board summary of what is currently know about a theory. These practices attempt to synthesize the findings from a number of studies in order to see where the preponderance of evidence lies. There are a number of categories of these summary articles (Grant #038; Booth, 2009, pp 94-95) that are here distilled down to two: Systematic Reviews and Meta-Analyses. Systematic Reviews: Since studies related to a particular theory do not all examine the same phenomenon, or come at the phenomenon from different angles, the full body of evidence surrounding is often hard to find and pin down. A systematic review is an attempt to identify, appraise and interpret the entire available knowledge base surrounding a topic. The studies evaluated come from a variety of methodologies (Qualitative, Quantitative and MixedMethods). Systematic reviews require a commitment to the avoidance of any possible bias and include confirming and disconfirming evidence for any theory assessed. Since there is not a statistical basis for conclusions, and they, therefore, must be derived from some qualitative interpretation, it is critical that these reviews are thoroughly vetted for objectivity (through the peer-review process). Meta-Analyses: When there are number of quantitative studies, all with available data for preand post-tests and for the independent and dependent variables, it is possible to pull all the data together to see if a consistent picture emerges. The primary benefit of using multiple 12 datasets is that the sle size used in the analysis increases. And, as those familiar with statistics will recall, the larger the sle size, the smaller the margin of error.10 There are some cautions related to the usefulness of meta-analyses. The studies combined may not all have the same ‘operational definition’11 of either the independent or dependent variables, resulting in the possible combination of disparate variables in the same analysis. Additionally, should the studies have been conducted at dramatically different times, then an emergent (or retreating) phenomenon may be masked by the earlier data. It is not always possible to employ meta-analyses, or, indeed, any methodology from the top of the Pyramid of Evidence. That’s OK. It is not always possible to design controlled, randomized studies, or, for that matter, any longitudinal study. The value of the Pyramid is not that it teaches us which studies to trust and which to dismiss. Rather, it tells us how much value to place on the results of the study/source, so that when it comes to weighing the evidence, there is a basis for doing so. Decision makers should always use the best available information. Original vs Interpreted Evidence Even when evidence comes from an ostensibly credible source, it is important to distinguish between the mechanisms by which we receive it. If we do not create the knowledge ourselves (via experiments within our own organization) then the most credible in this regard is information passed on to us directly from the individual(s) who created it. This type of evidence, called a Primary Source, should also be ‘peer-reviewed’ to ensure that a) the theory presented does not dismiss (without evidence) current thinking on the subject, b) the study maintains proper scientific rigor, and 3) that the statistical evidence of any concomitance supports the conclusions. These Primary Sources are preferred because they should be undistorted through interpretation. Primary sources such as peer-reviewed academic articles are often thick with research and statistical jargon. That language can be confusing and tedious and, consequently, there are those who translate the studies into more accessible language, filtering out the often-confusing details and, for the most part, cutting to the chase: what did the research actually conclude? These translations are called ‘secondary sources.’ Quality exles of secondary sources include textbooks and The Harvard Business Review and, perhaps, this article. While potentially very helpful (and time saving), secondary sources come with some pretty significant cautions: To what extent should you trust the translator? Do they have the requisite knowledge for the translation? Are they biased in any way, selectively passing on only information that supports a particular point of view? There is no easy method to sort the good from the bad. Peer review of the secondary source helps, as does a knowledge of the author and their and prior work. One important 10 As the sle size increases the margin of error/standard error shrinks at a rate that is equivalent to the inverse of the square root of the sle size. Smaller margins of error give a greater statistical ability to draw conclusions regarding hypotheses. 11 An operational definition is definition that a researcher uses for the purpose of defining a concept (such as ‘CEO hubris’ or ‘acquisition success’) during the course of a study. 13 reason that decision makers should become versed in research methodology is that they are then qualified to read and interpret for themselves Primary Research, avoiding the risk of poor or misleading interpretations. More problematic than secondary sources are tertiary sources. A tertiary source is the result of what happens when a third person translates a secondary source and the opportunity for misinterpretation is compounded. Take the exle John Oliver presented during his June 8th, 2016 Last Week Tonight program on Scientific Studies. A television producer interpreted a press release from the PR department of a medical society to state that published research concluded that eating chocolate while pregnant can reduce the risk of preeclsia. The study, in fact, concluded nothing of the sort. Yet, that news was reported on a nationally broadcast morning news show. Tertiary sources are unreliable evidence. Applied Research Published research is an excellent source of information regarding theories and phenomena in a general sense. However, the general sense or the conditions surrounding the studies conducted may or may not be similar to the ones facing your organization. There may be too many extraneous variables to match to a specific situation. In that light, it is always useful to treat the results found in published research as tentative, or as a hypothesis to be tested in your specific situation. In order to gain some confidence that theory actually does apply for a particular decision, it is highly recommended that, where possible, organizations conduct their own experiments to see if the relationships discovered in the published research are maintained when it comes to the instant case. pilot studies, test marketing, beta tests are all types of what is known as ‘applied research;’ the study of specific, rather than generic, understandings of the extent to which one variable influences another. As with published research, the higher on the Pyramid of Evidence the applied research is located, the better the information derived and the more useful in organizational decision making. It is strongly advised that organizations seek to find out generic research findings actually apply in their unique situation. Conclusions Meta-analyses have not been conducted in all areas; multiple studies on a single topic are not common. It is not always possible to conduct a controlled, randomized, longitudinal study (for exle it would not be feasible to put some members of a single unit on flex time, while excluding others). Control groups are not always possible (the ability to change the independent variable may have to be systemwide). We work with the information we have (or can create through applied research), but we try not to settle for weak evidence when stronger is available. The Pyramid of Evidence is useful in that it gives us the ability to rate the credibility of evidence, but it does not pass judgement on the conclusions gleaned from any particular study. In an evidence-based management approach, we are expected to use the ‘best available evidence.’ The Pyramid is a useful tool in establishing what the ‘best available’ is. 14 Appendix: Pyramid of Evidence Exle As an exle of the differences of the levels of the pyramid of evidence the effectiveness of a commercial (in ‘causing’ sales) will be used. A cola company would like to establish whether or not a specific commercial (say featuring a Tyrannosaurus Rex) would ‘cause’ (e.g., influence) an increase in sales, they might seek evidence from any level from the Pyramid of Evidence. 5) Expert Advice: The company could hire an independent advertising effectiveness consultant to provide them with their expert opinion as to whether or not advertising employing dinosaurs, and, in particular, T-Rex would be effective. 4) Descriptive Study: The company could a) ask a sle of customers (and non-customers) what they thought of the idea commercials featuring Tyrannosaurus Rex or b) show the commercial to a sle of customers and then follow up a couple of weeks later to see how much of the cola those in the sle had purchased. 3) Correlational Study: Company researchers track down previous instances of T-Rex advertising (in other products) and conducts a broad survey to see if there was a relationship between the ability to recall such advertising and any purchase of the product advertised. 2c) One Sle Longitudinal Experiment: The company could take a convenient sle and then ask the sle how much of the cola they had purchased in the previous two weeks, then show the sle the T-Rex commercial, then, two weeks later ask the sle how much of the cola they had purchased since seeing the commercial. [Any difference between the two measures might be considered evidence of the commercial’s effectiveness.] 2b) Controlled Longitudinal Experiment: As above, a convenient sle could be taken and asked how much of the cola they had purchased in the prior two weeks. Half the sle, say those whose last names start with A through M, (the ‘experimental’ group) would be shown the T-Rex commercial, but not the other half, those whose last names start with N through Z, (the ‘control’ group). Two weeks later, both groups are asked how much of the cola they had purchased in the prior two weeks. [The difference between the two second measures, less the difference between the first and second measures of the control group, could be used to measure the effect of the commercial in influencing sales.] 2a) Randomized, Controlled Longitudinal Experiment: The same as 3) above, with the exception that the sle is selected randomly from the population of cola drinkers and then randomly assigned to either the experimental group (those who see the commercial) and the control group (those who do not). 1) Conduct a Systematic Review, CAT, REA all studies of the effectiveness of commercials using dinosaurs. 15 References Barends, E. (2012), 6. Evidence based management: What is the best available evidence? (PowerPoint), (Module 7: Best Available Evidence @ https://www.cebma.org/teaching-materials/), The Center for Evidence Based Management. Brennan, A. (2017), Necessary and Sufficient Conditions, The Stanford Encyclopedia of Philosophy (Summer 2017), Edward N. Zalta (ed.). Crossman, A. (2018), How Intervening Variables Work in Sociology, ThoughtCo, Feb 1. Grant, M. #038; Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies, Health Information and Libraries Journal, 26, pp. 91-108. Henderson, B. (2006), The Gospel of the Flying Spaghetti Monster; Villard. McCandless, D. (2012), Rhetological Fallacies: Errors and manipulation of rhetoric and logical thinking, Information is Beautiful. Morin, R. #038; Cohen, D. (2018), Giuliani: ‘Truth isn’t Truth’, Politico, August 19th. Nutt, P. (1999), Surprising but true: Half the decisions in organizations fail, The Academy of Management Executive, 13(4), pp. 75-89. Oliver, J. (2016), Scientific Studies (video), Last Week Tonight (6/8/2016), HBO. Petticrew, M. #038; Roberts, H. (2003), Evidence, hierarchies, and typologies: horses for courses, Journal of Epidemiol Community Health, 57, pp. 527-529. Shadish, W., Cook, T. #038; Cbell, D. (2001), Experimental and Quasi-Experimental Designs for Generalized Causal Inference; 2nd ed., Cengage. Sinderbrand, R. (2017), How Kellyanne Conway ushered in the era of ‘alternative facts’, The Washington Post, January 22nd. theory.BusinessDictionary.com. Retrieved March 03, 2020, from BusinessDictionary.com website: http://www.businessdictionary.com/definition/theory.html Tversky, A. #038; Kahneman, D. (1973), Availability: A heuristic for judging frequency and probability, Cognitive Psychology, 5, 2, pp. 207-232. Vigen, T. (2015), Spurious Correlations, Hachette Books. (tylervigen.com/spurious-correlations) Wise, J. (2020), Trump says he’ll use ‘facts and instincts’ when deciding to push for US to reopen, The Hill, 4/12/20 7:33 AM EDT. 16 THE CREDIBILITY OF EVIDENCE Paul C. Boyd, PhD 2020 The Credibility of Evidence© Paul C. Boyd, PhD Keywords: Pyramid of Evidence, Causality, Concomitant Variation Management is a practice defined by decision making; the better the decision making, the better the institutional outcome. Yet, research (Nutt, 1999) suggests that more than fifty percent of business decisions fail to meet their objectives. Not a great record. Why should we believe them when someone makes a claim? How do they know it? What is their source of information? The term ‘evidence’ is used to denote actionable knowledge. And some evidence is more credible than other; meaning that it is stronger, that we have more reason to trust that it is ‘true.’ The process of establishing the credibility of evidence is a challenging, yet often-overlooked managerial responsibility. Decision effectiveness can be linked to the quality of the evidence used to inform the practice. And, while some evidence is trustworthy, much should be regarded with various levels of suspicion. The term used here to spectify the differences in the various levels of ‘quality’ is credibility. The identification of which sources of information constitute a ‘high’ level of credibility has not been an easy process and many managers have struggled with the determination as to which information should be trusted. Over the past few decades, a tool has evolved to assist practitioners in this process of evidence assessment. The general name for this instrument is The Pyramid of Evidence and it has been widely employed in in the ‘Evidence-Based Medicine’ movement. Broadening the pyramid past clinical trials is important in the practice of Evidence-based Management. Before we get to establishing the credibility of evidence, we should take a look at what evidence is, what it is used for, and how we acquire it. Evidence, Facts, Truths and Proof Reasoning is the process of reaching a conclusion and the basic process for reasoning is an argument; a set of reasons posited as evidence for a conclusion. The basic type of argument in business decisionmaking is ‘inductive reasoning.’ This process involves the attempt to draw a conclusion about a whole group (e.g., a company’s customers) by measuring some subset of the group (e.g., a sle of the customers) and then making the leap that if something is true for the subset; it is likely true for the entire group. Inductive reasoning is imprecise. While considered valid, at least in some political circles, to claim that there are different realities1, it is more helpful in business decision-making to understand that there may be differing perceptions, but a single reality. There’s a difference between “That’s not the way it is” and ‘That’s not the way I see it.” 1 “You’re saying it’s a falsehood, and they’re giving — our press secretary, Sean Spicer, gave alternative facts to that.” (Kelleyanne Conway, Meet the Press, 1/22/17). As reported by Rebecca Sinderbrand in the Washington Post in How Kellyanne Conway ushered in the era of ‘alternative facts’ 1 So, let’s establish that there is a single reality and that we have imperfect methods of perceiving it. And, just because we have imperfect methods for figuring it out, does not mean we shouldn’t try. Facts are accurate representations of the single reality. They can be very hard to establish. Generally, it is so hard to ‘prove’ a fact that we shouldn’t even try to make the claim. Rather, evidence supports a conclusion, it does not prove it. We seek evidence, and the stronger, the better. Causality #038; Influence Managerial decision making is most effective when it correctly ascribes ‘cause-and-effect’ to phenomena and, can, therefore make the required changes to achieve the desired effect. Would a new benefits package attract higher quality candidates? Is there a relationship between mobile phone handset brand ownership and clothing brand preference? Do companies whose CEOs have higher salaries overpay for business acquisitions? Actually establishing causality is not an easy practice. How can you know if one thing truly ‘causes’ another? That is, what does it take to make the claim that X influences Y? To be clear, cause-and-effect conversations are more about propensities or likelihoods than they are about absolutes; ‘probability of success’ rather than ‘if, and only if.’ So, rather than ascribing ‘cause,’ discussions tend to be about determining if one class of events (say, free continental breakfasts) influences another class of events (say, the number of hotel customers). There are le descriptions of the philosophical attribution of cause-and-effect through the concepts of the ‘necessary’ and ‘sufficient’ conditions (see, for exle, Brennan, 2017). However, another model suffices for executive decision making. Rather than get overly concerned about the actual requirements for ‘cause-and-effect’ decision makers are most concerned about whether or not they should conclude that one thing influences another (and what that influence ‘is’). Managers use the terms interchangeably. In order to make an assertion that X influences Y, four elements should be simultaneously present: 1) Good reason: There must be a theoretically sound reason as to ‘Why’ X influences Y. 2) Temporal order: X must occur before Y. 3) Absence of a Better ‘Cause’: ‘Intervening’ or background variables do not provide a more straightforward explanation of Y. 4) Concomitant variation: Different values/measures of X should generally correspond to different values of Y. 1) Good Reason “When you tell me that, you know, he should testify because he’s going to tell the truth and he shouldn’t worry, well that’s so silly because it’s somebody’s version of the truth. Not the truth.” (Rudy Giuliani , Meet the Press, 8/19/18). As reported by Rebecca Morin #038; David Cohen for Politico in Giuliani: ‘Truth isn’t Truth’ 2 Theory has been described as the ‘plausible’ explanation of cause-and effect (theory. BusinessDictionary.com). In this sense, plausible suggests two elements: a) that the explanation is logical and rational and 2) that explanation has not been discredited by credible research. Basically, a theory remains a theory up until the time it has been invalidated. Theories have relative strength. Stronger theories have been ‘scientifically’ tested numerous times and survived; weak theories not so much. Darwinian ‘evolution’ is a stronger theory than ‘trickledown economics;’ it has been tested and survived many times. Some claims bear the name of ‘theory,’ but if they have not been tested, they are more appropriately termed ‘speculation.’ If the speculation cannot be scientifically tested, it cannot (should not) claim the mantle of a theory. See, for exle, the Flying Spaghetti Monster ‘theory’ presented by the Pastafarians (Henderson, 2006). This claim is clearly based on speculation, as are the other so-called ‘theories’ (e.g., Intelligent Design) it seeks to parody. 2) Temporal Order While easily overlooked, this requirement is fairly important. Basically, the ‘cause’ must precede the ‘effect;’ that is, something cannot influence something in the past. Sometimes this can be difficult to establish. For exle, does an increase in crime influence an increase in gun ownership (for protection)? Or does an increase in gun ownership lead to more crime? It would depend, in part, on which went up first. 3) Absence of a Better Cause The possibility that there are other, typically more straight-forward, explanations of phenomena must not be overlooked. Philosophically, this process is described by Ockham’s Razor. Taking liberties with the translation, this means that simpler explanations (those requiring fewer assumptions) are more likely to lead to more accurate descriptions of cause-and-effect. From a theoretical/research perspective, this means that a concerted effort must be made to uncover two types of competing causes: 1) ‘Intervening’ variables and 2) background or environmental variables that are primary influences on both X, the supposed cause, and Y, the supposed effect. a) Intervening Variables Variables that (causally) exist between the theorized causing variable X and the caused variable Y are called Intervening Variables. A stated theory may be that X causes Y, but, if, in fact, X causes Z, and Z causes Y, then the original theory is invalid. In social and psychological fields intervening variables may be difficult to measure concepts such as states-of-mind, intelligence or aptitude. Take, for exle, the often-claimed theory that ‘education causes income;’ basically, that the higher one’s education (generally) the higher the salary. Yet this direct link is not quite accurate; there is an intervening variable (‘occupation’) which better describes ‘income.’ ‘Education’ can qualify someone for an occupation, and it is the occupation that commands higher salaries. 3 Not, Education Income But, Education Occupation Income (This exle was taken from Crossman, 2018.) b) Environmental Variables In some instances, relationships are presented as theories simply because, well, a relationship appears to exist. Yet these relationships mask the true causal mechanism. So, while the theory may be that X causes Y, in truth both X and Y are caused by Z, a third variable. These third variables are often called Lurking or Environmental Variables. For exle, historically there exists a statistical relationship between the number of board feet of lumber produced and the total weight of nails manufactured. This could lead to a theory that lumber production causes nail production. Yet, there is a third variable that explains both lumber production and nail manufacturing … population size. The size of the population influences (to some extent) the demand for both … and a host of other consumer products. Or the classic exle of ice cream sales causing pool drownings (highly statistically related), when the outside air temperature (summer/winter, etc.) is a more accurate explanation of both.2 So, rather than X causing Y, Z causes both X and Y. Not, Lumber Production but, Nail Production Population Size Lumber Production Nail Production There must be an exhaustive search for Intervening and Environmental variables to ensure that any attribution of cause-and-effect may be honestly asserted. 4) Concomitant Variation 2 This helps explain one of the common logical fallacies: correlation ≠ causation. Just because two variables are correlated does not mean that one caused the other. 4 If one thing has an influence on another, then different values of the first should be reflected in different values of the second. If advertising impacts sales, then there should be notable differences difference in sales when advertising is increased or decreased. The term for this phenomenon is called Concomitant Variation, but is more commonly called Correlation (which is one type of concomitant variation). Establishing whether or not concomitant variation exists is a challenging exercise that falls into two closely related fields: Research Methodology and Inferential Statistics. The determination that one thing changes in response to another (causality) is assessed on the methodological ability to establish that it is actually X that is causing Y and not something else. And this depends, to a large extent on the quality or validity of the methodology used to measure any change3. Understanding the differences in methodologies will help to explain the various levels of the credibility of evidence. As a reminder, all four of these elements of causality must be concurrently present in order for one to reach the conclusion that X causes Y. Logical fallacies abound when claims of causality rely on just one element. See, for exle: ‘Anecdotal Evidence,’ ‘Cum Hoc Ergo Propter Hoc,’ ‘Post Ho Ergo Propter Hoc,’ and ‘Ignoring a Common Cause’ (McCandless, 2012). Establishing Concomitant Variation The concomitant variation aspect is critical in establishing causality. And in order to draw any conclusion regarding concomitant variation, two variables must be measured; the independent (or influencing) variable and the dependent (or influenced) variable. The methods by which we can establish that X actually influences Y with confidence rely to a large extent on probability theory. The gist of statistical analysis is to establish (test) that any relationship uncovered in data is not an artifact of random probabilities. The various statistical methods for determining concomitant variation depend on the ‘levels of measurement’ of the causing (independent or X) and the caused (dependent or Y) variables. Basically, variables must be separated into two broad types (‘levels’) for the purpose of analysis: Categorical: Where the values that the variable can take are categories (Urban: Suburban: Rural or United: Delta: American). Categorical variables are statistically summarized through proportions (%s). Numeric: Where the values that the variable can take are quantitative (0: 1: 2: etc. children or the number of purchases /year). Numeric variables are statistically summarized through averages (means) and spread (standard deviations). The primary statistical methods for determining concomitant variation are: ➢ Chi-Square (2) when both variables are categorical. ➢ ANOVA (Analysis of Variance) when the Independent variable is categorical and the dependent variable is numeric. 3 See, for exle, Shadish, Cook #038; Cbell (2001). 5 ➢ Logistical Regression when the independent variable is numeric and the dependent variable is categorical. ➢ Correlation when both variables are numeric. Causality, as defined here, requires that different levels of the independent variable be established and measured. Methodologically, there are two ways this can happen: ➢ Correlational Studies: The values of the independent and dependent variables are measured at the same time, and then one of the above analyses above is conducted to determine if there is statistical evidence that different levels of the dependent variable tend to coincide with different levels of the independent variable. For exle, a survey may collect data on both educational attainment (as an independent variable) and retirement investment dollar size (the theoretical dependent variable). An ANOVA test would then be run to determine if, higher levels of retirement investments were associated with higher levels of educational attainment. ➢ Longitudinal Studies (Experiments): The value of the independent variable is manipulated by the researcher from one value to another, and the value of the dependent variable is measured for each value (of the independent variable). Then an analysis is conducted to determine whether or not differences exist in the dependent variable as the independent variable changes. For exle, an organizational researcher could change the bonus structure (the independent variable) in one division of a company and then measure whether or not employee turnover (the dependent variable) changed. Generally, Longitudinal studies are preferred over correlational ones. The advantages of longitudinal studies include: 1. Researchers have greater control over the values of the independent variables. 2. The likelihood of accidentally stumbling upon a ‘spurious’ relationship is diminished4. Spurious relationships are random artifacts in databases wherein the values of two variables in the data collected may be statistically correlated, but the relationship does not actually exist in the population.5 and, 3. The ‘validity’ of the research is enhanced. Validity refers to the assurance that the dependent variable measures the intended concept. Longitudinal studies come in a variety of forms. Some of these are more helpful in establishing causeand-effect than others as they ‘control’ more of the threats to validity (Shadish, et al, 2001). 4 5 See, for exle, Vigen, T. (2015) Spurious Correlations and tylervigen.com/spurious-correlations One method to check for such spurious relationship is, if the database is large enough (i.e., has a very large sle size), conscientious researchers ‘hold back’ some of the data from the initial analysis and, should a relationship be uncovered, follow-up by testing whether the relationship continues to exist in the held back cases. 6 The Pyramid of Evidence The hierarchy of evidence credibility – the ability to establish cause-and-effect – has been presented as the Pyramid of Evidence, characterized by lower quality evidence at the bottom and the most actionable, highest quality at the top. Much of the credit for these types of hierarchies can be traced to the Evidence-based Medicine movement. The typologies from those fields tend to focus, logically, on epidemiological studies and clinical trials (see, for exle, Petticrew #038; Roberts, 2003). With minor modifications, the Pyramid can be modified for management studies. In increasing order of credibility, the pyramid levels are: 5) Expert Opinion, 4) Descriptive Studies #038; Case Studies, 3) Correlational Studies, 2) Longitudinal Studies (in three strengths, 2c) One Sle Longitudinal Experiments, 2b) Controlled Longitudinal Studies, 2a) Randomized, Controlled Longitudinal Studies) and 1) Meta Analyses #038; Systematic Reviews. Understanding the differences in these levels is important when a decision-maker is attempting to figure out which evidence to trust when making causal claims. It should be noted here that there are claims that are posited as evidence sufficient for reaching a conclusion that do not rise to any acceptable standard of credibility for effective decision making: • • • ‘Anecdotal’ information: This information is generally developed from a non-random, very small and unrepresentative sle. This type of information may be useful for developing hypotheses, but should not be used to reach conclusions. ‘Instinct’: Generally, a manifestation of a heuristic (Tversky #038; Kahneman, 1973), likely the ‘Availability’ heuristic. While heuristics may be useful in repeated, routine decisions, they notoriously falter when applied to (even remotely) similar decisions.6 ‘Gut feelings’: Similar to ‘Instinct,’ but generally presented without foundation (any explanation of what was considered to reach a conclusion). Often baseless and self-serving. None of these constitute a credible reason to reach a conclusion about what constitutes a good course of action. Rather, ‘credibility’ relies on the ability to show, a priori, that a course of action has a good chance of being successful. 6 See, for exle, Wise, J. (2020), Trump says he’ll use ‘facts and instincts’ when deciding to push for US to reopen. 7 The Pyramid of Evidence7 1 5) META ANALYSES/SYSTEMATIC REVIEWS 2a RANDOMIZED, CONTROLLED LONGITUDINAL EXPERIMENTS 2b CONTROLLED LONGITUDINAL EXPERIMENTS 2c ONE SLE (UNCONTROLLED) LONGITUDINAL EXPERIMENTS 3 CORRELATIONAL STUDIES 4 DESCRIPTIVE STUDIES #038; CASE STUDIES 5 EXPERT OPINION Expert Opinion The advice of ‘experts’ is often used as a source of information in organizations. Consultants for virtually any organizational decision are available, often for a healthy fee. Experts sell their advice based on their greater expertise; a track record of success, specific subject-matter education, an ’objective’ perspective, and so on. In many instances this expertise is valid and based upon empirical research, in others, not-so-much. Since the motivations of ‘experts’ may be opaque, it is best to determine the source of their knowledge: How do they know what they claim? Because it is so difficult to establish what ‘expert knowledge’ actually is, and to what extent any consultant actually posses it, decision makers should seek information from higher levels of the pyramid. 7 Adapted from Barends, E. (2012), 6. Evidence based management: What is the best available evidence? (PowerPoint), Slide 31, (See Module 7: Best Available Evidence @ https://www.cebma.org/teaching-materials/), The Center for Evidence Based Management. 8 4) Descriptive Studies #038; Case Analyses Descriptive studies present a picture of a situation at one particular time. Case Studies present a significant amount of information about a single situation. ΟD 8 a) b) ΔI ΟD In either case, the measurement may be taken with (b) or without (a) a preceding change in another variable (i.e., an independent variable). For exle, an organization could conduct a study of its clients to try to get an idea of what percentage of them were repeat customers (a). Likewise, a company could change its healthcare insurance provider, then 6 months later, survey employees to see how satisfied they were with the insurance (b). Both of these exles are problematic in establishing cause-and-effect; the first (a) because it is impossible to determine why the proportion is what it is (and, therefore, no way to know how to change it), and the second (b) because, without a prior study, you can’t really tell if satisfaction went up, down or stayed the same as a result of the change in providers. Descriptive studies have been termed statistical snapshots since they typically collect data (sometimes from many variables) and then use simple inferential statistics to make a generalization about the nature of the entire population. Market segmentation studies, for exle, tend to be descriptive studies of which subsets of the population tend to purchase a particular product. Case studies tend to deeply analyze, often with data, aspects of a single phenomenon; such as a successful product launch or a corporate failure. Causal links may be posited, but are often hard to truly establish. And, as with anything based upon a sle size of exactly 1, it is impossible to make generalizations. Because so much information is presented in a detailed case study, it may be tempting to draw parallels between the situation described and another. Don’t base organizational decisions on evidence extracted or deduced from case studies. As mentioned above, establishing concomitant variation requires that different levels of a dependent variable coincide (generally) with different levels of the independent variable. As a reminder, there are two ways that the different levels can be established: 1) by observing different levels of the independent variable and dependent variable at the same time (Correlational Studies) or 2) by altering the independent variable to see if there is a coincident change in the dependent variable (Experiments). 8 For these diagrams: Δ = Change (… the value of) Ο = Observe (measure) I = the Independent variable D = the dependent variable This first diagram simply states that, at some point, the Dependent variable is measured. The first diagram shows a change in the independent variable followed by a measurement of the dependent variable. 9 3) Correlational Studies In Correlational Studies the data for both the independent and dependent variables are collected once, at the same time. Then, a statistical analysis is performed to see if different levels of the independent variable correspond generally with different levels of the dependent variable. A simple exle would be a survey which measured the number of years an employee had held their current job (an independent variable) and how ‘satisfied’ they reported they were with their job (the dependent variable). Then a statistical analysis would be performed to determine whether job satisfaction went up (or down) as employees spent more time in a particular job. 2) Longitudinal studies (experiments) The general class of longitudinal studies refers to a type of experiment wherein a variable (the dependent variable) is measured twice (or more), to see if there is a difference between the measures. These two measurements are called the Pretest (before the change in the independent variable) and -test (after the change). When the independent variable is changed between the pretest and -test, then it might be that the change in the independent variable influenced the change in the dependent variable. At the very least it could be evidence of the Concomitant Variation requirement for the establishment of cause-and-effect.9 2c) Single Sle (Uncontrolled) Longitudinal Studies The most basic longitudinal study involves measuring the dependent variable from a sle (to establish a baseline), then changing the independent variable, and then measuring the dependent variable a second time (to see if it had changed). It may be represented as: Pretest ΟD Manipulation ΔI Post test ΟD It might be tempting to then infer that any difference between test score and the Pretest score of the dependent variable could be attributed to the manipulation of the independent variable. Such a claim should be treated with some skepticism; we can’t tell for sure that it was actually the change in the independent variable that ‘caused’ the change in the dependent variable. It could easily be something else that changed (but wasn’t measured) that resulted in the change. For exle, it might be that it was a rise in air temperature in the summer that caused an increase in Iced Tea sales, rather than an Ad caign. There are a number of types of things that could impact the dependent variable, but were not measured. These factors are called Threats to Internal Validity (Shadish, Cook, #038; Cbell, 2001, p55.). 9 Beyond the scope of this paper are studies that use different sles (subjects, individuals) in the pre- and posttests. In general, using the same subjects if preferable, yielding more directly comparable results. 10 2b) Controlled Longitudinal Experiments One way to control at least some of the Threats to Internal Validity is to include a ‘control’ group in the study. A control group is a part of the sle (those measured) who do NOT experience a change in the independent variable. Those who DO experience a change in the independent variable are called the experimental group. Pretest Experimental Group Control Group Manipulation EΟD1 CΟD1 Post test ΔI EΟD2 CΟD2 In this type of study, we can see if the difference between the pretest and post-test scores for the two groups are the same (even if test measure is greater for both groups). If they are the same, it would be hard to make the claim that the change in the independent variable did anything to the dependent variable. For exle, in order to see if a flex time policy would improve employee morale, two different units in a company could be measured; say Accounts Payable as the experimental group and Accounts Receivable as the control group. The morale of each unit is initially measured (Pretest), then the policy is changed for the Accounts Payable unit (but not for Accounts Receivable). A year later both units are measured again (Post-test). If morale went up more (or down less) for the Accounts Payable group, then the change in policy was likely the cause. Still, some caution should be used before making a blanket claim here. There might be something fundamentally different in the two groups to begin with that could explain any difference. For exle, the Accounts Payable unit could be largely populated with younger people (with families, who might appreciate flex time) while the Accounts Receivable Unit was largely made up of empty-nesters (who would be less impacted by flex time). Shadish, et al (2001) would call this threat to internal validity Selection. 2a) Randomized, Controlled Longitudinal Studies In order to avoid Selection Bias as a Threat to Internal Validity above, the Experimental and Control groups should be identical in make-up (or, as similar as is possible). Without diving into probability theory, the best way to achieve this similarity is to randomly assign subjects (the sle) to the experimental and control groups and then conduct the experiment. This ensures that the ONLY difference between the two groups is the presence (or lack thereof) of the experimental treatment. Pretest Experimental Group (Random) EΟD1 Control Group (Random) CΟD1 Manipulation ΔI Post test EΟD2 CΟD2 Dissecting the table above: The difference between the two control group measures [CΟD2 CΟD1] represents and quantifies any change caused by unmeasured factors. The difference between the two experimental group measures [EΟD2 – EΟD1] quantifies the differences cause by both the unmeasured factors AND the change caused by the change in the 11 (measured) independent variable. Subtracting the change in the control group from the change in the experimental group [(EΟD2 – EΟD1) – (CΟD2 – CΟD1)] quantifies the unique contribution of the independent variable on the dependent variable. So, since the only difference between the two groups is the change in the independent variable, that change is the best (experimental) explanation for any post-test differences in the dependent variable for the two groups. Not only that, but the extent of the impact of the independent variable can be measured by noting any difference between the changes in the experimental and control groups. For a single study/experiment, Randomized, Controlled Longitudinal Studies are as good as it gets in establishing causality. 1) Meta Analyses and Systematic Reviews Scientists have come to realize over the years that, while somewhat compelling, any single study is open to a variety of possible constraints and biases (say, economic conditions or Industry) that might make broader application of the results problematic. They prefer to have more than one study to confirm the findings of the first. These types of ‘replication’ studies are not too common, but very important if we are to truly understand the extent to which we should trust a theory (see the earlier section on ‘Good Reason’). At the top of the Pyramid of Evidence are comprehensive reviews that attempt to combine the information from a number of studies into an across-the-board summary of what is currently know about a theory. These practices attempt to synthesize the findings from a number of studies in order to see where the preponderance of evidence lies. There are a number of categories of these summary articles (Grant #038; Booth, 2009, pp 94-95) that are here distilled down to two: Systematic Reviews and Meta-Analyses. Systematic Reviews: Since studies related to a particular theory do not all examine the same phenomenon, or come at the phenomenon from different angles, the full body of evidence surrounding is often hard to find and pin down. A systematic review is an attempt to identify, appraise and interpret the entire available knowledge base surrounding a topic. The studies evaluated come from a variety of methodologies (Qualitative, Quantitative and MixedMethods). Systematic reviews require a commitment to the avoidance of any possible bias and include confirming and disconfirming evidence for any theory assessed. Since there is not a statistical basis for conclusions, and they, therefore, must be derived from some qualitative interpretation, it is critical that these reviews are thoroughly vetted for objectivity (through the peer-review process). Meta-Analyses: When there are number of quantitative studies, all with available data for preand post-tests and for the independent and dependent variables, it is possible to pull all the data together to see if a consistent picture emerges. The primary benefit of using multiple 12 datasets is that the sle size used in the analysis increases. And, as those familiar with statistics will recall, the larger the sle size, the smaller the margin of error.10 There are some cautions related to the usefulness of meta-analyses. The studies combined may not all have the same ‘operational definition’11 of either the independent or dependent variables, resulting in the possible combination of disparate variables in the same analysis. Additionally, should the studies have been conducted at dramatically different times, then an emergent (or retreating) phenomenon may be masked by the earlier data. It is not always possible to employ meta-analyses, or, indeed, any methodology from the top of the Pyramid of Evidence. That’s OK. It is not always possible to design controlled, randomized studies, or, for that matter, any longitudinal study. The value of the Pyramid is not that it teaches us which studies to trust and which to dismiss. Rather, it tells us how much value to place on the results of the study/source, so that when it comes to weighing the evidence, there is a basis for doing so. Decision makers should always use the best available information. Original vs Interpreted Evidence Even when evidence comes from an ostensibly credible source, it is important to distinguish between the mechanisms by which we receive it. If we do not create the knowledge ourselves (via experiments within our own organization) then the most credible in this regard is information passed on to us directly from the individual(s) who created it. This type of evidence, called a Primary Source, should also be ‘peer-reviewed’ to ensure that a) the theory presented does not dismiss (without evidence) current thinking on the subject, b) the study maintains proper scientific rigor, and 3) that the statistical evidence of any concomitance supports the conclusions. These Primary Sources are preferred because they should be undistorted through interpretation. Primary sources such as peer-reviewed academic articles are often thick with research and statistical jargon. That language can be confusing and tedious and, consequently, there are those who translate the studies into more accessible language, filtering out the often-confusing details and, for the most part, cutting to the chase: what did the research actually conclude? These translations are called ‘secondary sources.’ Quality exles of secondary sources include textbooks and The Harvard Business Review and, perhaps, this article. While potentially very helpful (and time saving), secondary sources come with some pretty significant cautions: To what extent should you trust the translator? Do they have the requisite knowledge for the translation? Are they biased in any way, selectively passing on only information that supports a particular point of view? There is no easy method to sort the good from the bad. Peer review of the secondary source helps, as does a knowledge of the author and their and prior work. One important 10 As the sle size increases the margin of error/standard error shrinks at a rate that is equivalent to the inverse of the square root of the sle size. Smaller margins of error give a greater statistical ability to draw conclusions regarding hypotheses. 11 An operational definition is definition that a researcher uses for the purpose of defining a concept (such as ‘CEO hubris’ or ‘acquisition success’) during the course of a study. 13 reason that decision makers should become versed in research methodology is that they are then qualified to read and interpret for themselves Primary Research, avoiding the risk of poor or misleading interpretations. More problematic than secondary sources are tertiary sources. A tertiary source is the result of what happens when a third person translates a secondary source and the opportunity for misinterpretation is compounded. Take the exle John Oliver presented during his June 8th, 2016 Last Week Tonight program on Scientific Studies. A television producer interpreted a press release from the PR department of a medical society to state that published research concluded that eating chocolate while pregnant can reduce the risk of preeclsia. The study, in fact, concluded nothing of the sort. Yet, that news was reported on a nationally broadcast morning news show. Tertiary sources are unreliable evidence. Applied Research Published research is an excellent source of information regarding theories and phenomena in a general sense. However, the general sense or the conditions surrounding the studies conducted may or may not be similar to the ones facing your organization. There may be too many extraneous variables to match to a specific situation. In that light, it is always useful to treat the results found in published research as tentative, or as a hypothesis to be tested in your specific situation. In order to gain some confidence that theory actually does apply for a particular decision, it is highly recommended that, where possible, organizations conduct their own experiments to see if the relationships discovered in the published research are maintained when it comes to the instant case. pilot studies, test marketing, beta tests are all types of what is known as ‘applied research;’ the study of specific, rather than generic, understandings of the extent to which one variable influences another. As with published research, the higher on the Pyramid of Evidence the applied research is located, the better the information derived and the more useful in organizational decision making. It is strongly advised that organizations seek to find out generic research findings actually apply in their unique situation. Conclusions Meta-analyses have not been conducted in all areas; multiple studies on a single topic are not common. It is not always possible to conduct a controlled, randomized, longitudinal study (for exle it would not be feasible to put some members of a single unit on flex time, while excluding others). Control groups are not always possible (the ability to change the independent variable may have to be systemwide). We work with the information we have (or can create through applied research), but we try not to settle for weak evidence when stronger is available. The Pyramid of Evidence is useful in that it gives us the ability to rate the credibility of evidence, but it does not pass judgement on the conclusions gleaned from any particular study. In an evidence-based management approach, we are expected to use the ‘best available evidence.’ The Pyramid is a useful tool in establishing what the ‘best available’ is. 14 Appendix: Pyramid of Evidence Exle As an exle of the differences of the levels of the pyramid of evidence the effectiveness of a commercial (in ‘causing’ sales) will be used. A cola company would like to establish whether or not a specific commercial (say featuring a Tyrannosaurus Rex) would ‘cause’ (e.g., influence) an increase in sales, they might seek evidence from any level from the Pyramid of Evidence. 5) Expert Advice: The company could hire an independent advertising effectiveness consultant to provide them with their expert opinion as to whether or not advertising employing dinosaurs, and, in particular, T-Rex would be effective. 4) Descriptive Study: The company could a) ask a sle of customers (and non-customers) what they thought of the idea commercials featuring Tyrannosaurus Rex or b) show the commercial to a sle of customers and then follow up a couple of weeks later to see how much of the cola those in the sle had purchased. 3) Correlational Study: Company researchers track down previous instances of T-Rex advertising (in other products) and conducts a broad survey to see if there was a relationship between the ability to recall such advertising and any purchase of the product advertised. 2c) One Sle Longitudinal Experiment: The company could take a convenient sle and then ask the sle how much of the cola they had purchased in the previous two weeks, then show the sle the T-Rex commercial, then, two weeks later ask the sle how much of the cola they had purchased since seeing the commercial. [Any difference between the two measures might be considered evidence of the commercial’s effectiveness.] 2b) Controlled Longitudinal Experiment: As above, a convenient sle could be taken and asked how much of the cola they had purchased in the prior two weeks. Half the sle, say those whose last names start with A through M, (the ‘experimental’ group) would be shown the T-Rex commercial, but not the other half, those whose last names start with N through Z, (the ‘control’ group). Two weeks later, both groups are asked how much of the cola they had purchased in the prior two weeks. [The difference between the two second measures, less the difference between the first and second measures of the control group, could be used to measure the effect of the commercial in influencing sales.] 2a) Randomized, Controlled Longitudinal Experiment: The same as 3) above, with the exception that the sle is selected randomly from the population of cola drinkers and then randomly assigned to either the experimental group (those who see the commercial) and the control group (those who do not). 1) Conduct a Systematic Review, CAT, REA all studies of the effectiveness of commercials using dinosaurs. 15 References Barends, E. (2012), 6. Evidence based management: What is the best available evidence? (PowerPoint), (Module 7: Best Available Evidence @ https://www.cebma.org/teaching-materials/), The Center for Evidence Based Management. Brennan, A. (2017), Necessary and Sufficient Conditions, The Stanford Encyclopedia of Philosophy (Summer 2017), Edward N. Zalta (ed.). Crossman, A. (2018), How Intervening Variables Work in Sociology, ThoughtCo, Feb 1. Grant, M. #038; Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies, Health Information and Libraries Journal, 26, pp. 91-108. Henderson, B. (2006), The Gospel of the Flying Spaghetti Monster; Villard. McCandless, D. (2012), Rhetological Fallacies: Errors and manipulation of rhetoric and logical thinking, Information is Beautiful. Morin, R. #038; Cohen, D. (2018), Giuliani: ‘Truth isn’t Truth’, Politico, August 19th. Nutt, P. (1999), Surprising but true: Half the decisions in organizations fail, The Academy of Management Executive, 13(4), pp. 75-89. Oliver, J. (2016), Scientific Studies (video), Last Week Tonight (6/8/2016), HBO. Petticrew, M. #038; Roberts, H. (2003), Evidence, hierarchies, and typologies: horses for courses, Journal of Epidemiol Community Health, 57, pp. 527-529. Shadish, W., Cook, T. #038; Cbell, D. (2001), Experimental and Quasi-Experimental Designs for Generalized Causal Inference; 2nd ed., Cengage. Sinderbrand, R. (2017), How Kellyanne Conway ushered in the era of ‘alternative facts’, The Washington Post, January 22nd. theory.BusinessDictionary.com. Retrieved March 03, 2020, from BusinessDictionary.com website: http://www.businessdictionary.com/definition/theory.html Tversky, A. #038; Kahneman, D. (1973), Availability: A heuristic for judging frequency and probability, Cognitive Psychology, 5, 2, pp. 207-232. Vigen, T. (2015), Spurious Correlations, Hachette Books. (tylervigen.com/spurious-correlations) Wise, J. (2020), Trump says he’ll use ‘facts and instincts’ when deciding to push for US to reopen, The Hill, 4/12/20 7:33 AM EDT. 16

Order from Academic Writers Bay
Best Custom Essay Writing Services

QUALITY: ORIGINAL PAPER NO PLAGIARISM - CUSTOM PAPER

Why Choose Us?

  • non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee
SATISFACTION
SATISFACTION

How It Works

  • Click on the “Place Your Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT ; SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.

About AcademicWritersBay.com

AcademicWritersBay.comnbsp;is an easy-to-use and reliable service that is ready to assist you with your papers 24/7/ 365days a year. 99% of our customers are happy with their papers. Our team is efficient and will always tackle your essay needs comprehensively assuring you of excellent results. Feel free to ask them anything concerning your essay demands or Order.

AcademicWritersBay.com is a private company that offers academic support and assistance to students at all levels. Our mission is to provide proficient andnbsp;high quality academic servicesnbsp;to our highly esteemed clients. AcademicWritersBay.com is equipped with competent andnbsp;proficient writersnbsp;to tackle all types of your academic needs, and provide you with excellent results. Most of our writers are holders ofnbsp;master's degreesnbsp;ornbsp;PhDs, which is an surety of excellent results to our clients. We provide assistance to students all over the world.
We provide high quality term papers, research papers, essays, proposals, theses and many others. Atnbsp;AcademicWritersBay.com, you can be sure ofnbsp;excellent gradesnbsp;in your assignments and final exams.

NO PLAGIARISM
NO PLAGIARISM
error: Content is protected !!