Methodological Challenges to Outcomes Assessment

There is really one overarching issue that plagues clinical outcomes assessment: the information used to identify outcomes and their correlates is typically not recorded for those purposes. As a result, attempts to assess outcomes and the processes that contribute to them must "make do" with what's available. Typically, this means that populations are identified using administrative or billing codes such as the Common Procedural Terminology codes (CPT) and the Internation Disease Classification codes (currently ICD-9 is the most commonly implemented version). Once a population is identified, manual abstraction of certain values in the patients' clinical record are extracted. Again, these values are typically recorded for communicative or legal purposes, not necessarily for research at a future date. There is an increasing amount of research on the use of automated methods to extract clinical outcomes assessment research. While these techniques are important for the efficiencies of scale they can potentially offer, they are only as accurate as their input.

The challenges presented by these approaches are outlined below in addition to alternative approaches researchers have experimented with to assess clinical outcomes. Citations are intended to provide enough information for readers to locate articles using Pub Med or Google Scholar. As with all entries, notes following citations are my own personal notes on the matter, recorded for my own selfish purposes :)

Gerbert & Hargreaves (1986) "Measuring physician behavior" Med Care. 24: 838-847.

Burns et al. (1992) "Self-report versus medical record functional status" Med Care. 30(suppl):MS85-95.

Found that missing data is a problem in chart abstraction. Charts are more specific than they are sensitive. Luck et al found that problems exist on both measures.

Beard, Yunginger, Reed, et al. (1992) "Interobserver variability in medical record review: an epidemiological study of asthma" J Clin Epidemiol. 45: 1013-1020

Carter, Rogowski (1992) "How pricing policies, coding, and recalibration method affect DRG weights" Health Care Financ Rev 14: 83-96

Health Services Research Group (1992) "Quality of Care: 1. What is quality and how can it be measured?" Can Med Assoc J. 146: 2153-2158.

Iezzoni, Foley, Daley, et al. (1992) "Comorbidities, complications, and coding bias, Does the number of diagnosis codes matter in predicting in-hospital mortality?" JAMA 267:2197-2203

Rubin, Rogers, Kahn, et al. (1992) "Watching the doctor watchers. How well do peer review organization methods detect hospital care quality problems?" JAMA 267:2197-2203

Cited as claiming that manual abstraction is still the most commonly used method for measuring quality by (Luck 2000).

Rethans, Martin, Metsemakers (1994) "To what extent to clinical notes by general practicioners reflect actual medical performance? A study using simulated patients" Br J Gen Pract 44: 153-156

Garnick, Fowles, Lawthers, et al. (1994) "Focus on quality: profiling physicians' practive patterns" J Ambulatory Care Manage. 17: 44-75

Localio, Landis (1995) "Quality of chart review for quality of care" JAMA 274: 1585-1586

Lawthers, Palmer, Banks, et al. (1995) "Designing and using measures of quality based on physician office records" J Ambulatory Care Manage 18: 56-72

Fowles, Lawthers, Weiner, et al. (1995) "Agreement between physicians' office records and Medicare part B claims data" 16:189-199

Rethans et al. (1996) "Methods for quality assessment in general practice" Fam Pract. 13: 468-476

Gilbert, Lowenstein, Koziol-McLain, et al. (1996) "Chart reviews in emergency medical research: where are the methods?" Ann Emerg Med 27:305-308

Cited as claiming that manual abstraction is still the most commonly used method for measuring quality by (Luck 2000).

Lawthers (1996) "Methodology matters: III. Validity review of performance measures. Int J Qual Health Care. 8: 299-306

Wu & Ashton (1997) "Chart review: a need for reappraisal" Eval Health Prof. 20: 143-163

Iezzoni (1997) "Assessing quality using administrative data" Ann Intern Med. 127: 666-674.

McGlynn (1997) "Six challenges in measuring the quality of health care" Heatlh Aff 16: 7-21

Hershey & Karusa (1997) "Assessment of preventative health care: design considerations" Prev Med. 26: 59-67

Highlights problems of chart abstraction: illegibility, missing reports, variance in human abstractors' skills

McKee & Sheldon (1998) "Measuring performance in the NHS. Good that we moved beyond money and activity but problems remain" Br Med J. 316: 322

Davis & Lampel (1998) "Trust in performance indicators?" Qual Health Care. 7: 159-162.

Kleinke (1998) "Release 0.0: clinical information technology in the real world" Health Aff 17:23-28

Berg & Goorman (1999) "The contextual nature of medical information" Int J Med Informatics 56: 51-60

Great overview of some of the challenges that arise when medical data captured for one reason (billing, communication, etc) is used for another.

Notes: Attention has been paid to the secondary utilization of data basically because of privacy issues and the accountability of healthcare professionals. Little attention has been paid to another crucial issue: IS the secondary utilization of healthcare data possible and what does it take to make it possible?

Current viewpoint: it becomes feasible as soon as the IT connections are in place. In such a view the medical information is conceptualized as givens about a patient that are collected and then stored in a record. Info is a commodity. Substance that is transferable and independent of its vehicle. Autonomous, atom-like building blocks which can be stored in a neutral medium. [Agre ]

His primary point is that this is a wrong viewpoint. “Information should be conceptualized to be always entangled with the context of its production.” 52

Disentangling is possible, but entails work. Not sure that it is always possible. The information useful to the researchers needs to match up with the priorities of the clinician. He then takes it onto the practical level of who does this disentangling and who benefits?

3 ways in which information is entangled with the context of its production (and why atomic view doesn’t fly: 1) Data are always produced with a given purpose and their hardness and specificity is directly tailored to that purpose [Berg 1997]

Gives the example of apparently incomplete data from a case study / antecdote. In this case, if medical information is seen as a series of context free givens, then the clinical conclusion of Agnes’ record is that there was an incomplete examination conducted.

With a context of caring for patients one can expect such omissions as brevity is a essential part of managing clinical workload [Garfinkel 1967, Harper 1997]

2) Atomical view of information overlooks how med data mutually elaborates each other [Whalen 1993]

Medical data shouldn’t be viewed as a heap of facts as much as bits and pieces of an emerging story [Hunter 1991].

The addition or exclusion of some value often gains its meaning from previous (or next) values. The data, like the patient is a system. Ex. Omission of the data item “murmurs: none” changes significantly if the entry after remarks read “now 3 days after valvular surgery.” In this case we would expect no murmurs to be the result of a meticulous investigation rather than a cursory glance in the case of a patient in for a broken leg.

Temporal dimension is as crucial to medical information as it is to a story [Hunter 1991, Kay 1996]. In the course of a patient’s trajectory data items are constantly reinterpreted and reconstructed [Strauss 1985]. Medical work is characterized in many wards as ongoing (re)interpretation of the tendencies in graphs and tables. Great example of how this affects form and structure – galya’s work in building a tumor bayes net. 3) ‘physicians [and other health workers] typically assess the adequacy of medical information on the basis of the perceived credibility of the source’ [Cicourel 1990]

Exclusions from a more senior person may be looked upon differently than exclusions by a rookie [forgive and remember]

Physicians judge the quality fo the output by machines as well. They develop a sense of trustworthiness of the apparatuses they work with [Barley 1988] and they learn to trust the labs and x-rays produced by their departments

Separating context – law of med info Work is required to make data suitable for accumulation.

“law of medical information: the more active the accumulation the more work needs to be done. “

I like the definition from the abstract better: “the further information has to be able to circulate (i.e., the more diverse contexts it has to be usable in), the more work is required to disentangle the information from the context of its production.”

His next point is who is to do this work? Often it is not the doctors, nurses that benefit.

Many authors have expressed the hope that information will be “freed” from it’s current inaccessible paper format. The idea that information is something that can travel freely, independent of its medium is problematic. Even the highly standardized laboratory data that figure in every hospital record cannot be read without knowledge of that particular hospital’s normal values.

Disentanglement from primary context is possible. The translation to other contexts requires work.

Hofter et al. (1999) "The unreliability of individual physician 'report cards' for assessing the costs and quality of care of a chronic disease" JAMA. 281: 2098-2105.

Giuffrida et al. (1999) "Measuring quality of care with routine data: avoiding confusion between performance indicators and health outcomes" Br Med J. 319: 94-98.

McDonald (1999) "Quality measure and electronic medical systems" JAMA. 282: 1181-1182.

Luck et al. (2000) "How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record" AJM. 108: 642-649.

Design: Measured chart abstraction versus standardized patient as a gold standard. 160 physician patient encounters at two VAs. 20 docs (res and attending). Standardized patients filled out a checklist after the visit. Controlled for case mix (4 conditions). Used sensitivity \ specificity.

Results: Chart abstraction was 54% +_ 9%. Patient checklist was 68% +_ 9%. These are means for all 4 conditions used by standardized patients. Many false positives (19%). Steps recorded in chart that the patients didn't record. Differences in scores depending on the domains of the encounter. 4% diff in physical exam vs. 18% for history taking. Chart abstraction is better at recording some measures than others. ''This is a huge problem - difference in performance for different domains and parts of record implies the need to assess quality at multiple levels. This would be an incredibly costly endeavor. The challenge is identifying which types of measures it is good at.''

The medical record is neither a sensitive nor specific report of the clinical encounter. The diagnosis was right only 40% of the time - 13% less than the diagnosis reported by the standardized patient. This despite using standardized patients with common presentations. Leads to questions of the use of the record to estimate disease prevalence.

If busy physicians were not recording all their activities it would explain the false negatives, but not the false positives. The authors posit that rather than the record as a transcript of what happened, it is a justification of the proposed plan of action. This is in line with the views on documentation held by Trace and others in the IS / Achival domains.

Notes: The interesting thing here is that the patients reported higher quality care in their questionaires than the doctors gave themselves credit for. This is yet another study that calls into question the use of chart abstraction. Listed several others in discussion section that are posted here.

Smith & Berlin (2001) "Signing a colleagues radiology reports" Am Roentgen Ray Soc. 176: 27-30

Spies et al. (2004) "Which data source in clinical performance assessment? A pilot study comparing self-recording with patient records and observation" International Journal for Quality in Health Care. 6(1):65-72.

Design: Compared 168 consults against guidelines for 15 clinical conditions. 206 criteria reviewed using 1) the record, 2) observation in surgery, 3) physician structured self-reporting. Looking for total number recorded.

Results: Medical record examination provided 40%, observation 72%, and physician self-recording 95% of the data required for the review against guidelines. Nine per cent of the clinical decisions could be reviewed when using medical records, 46% when using observation data, and 69% when using data from prospective self-recording. In particular, decisions in the area of patient education and diagnostic examinations could not be reviewed validly using medical records only. Kappa agreements between the data available from the three sources as well as between the review results appeared to be 0.79.

Conclusions: Medical records alone only supply sufficient information for the review of a very limited set of clinical decisions. Physician self-recording has significantly more potential for valid review of a broad range of clinical decisions. Furthermore, self-recording seems a reliable data collection method that deserves further research.

If medical records are to be used in quality assessment they need to be adapted. Challenges will be 1) physician and quality controller have different perspectives on what is relevant and 2) clinician is unlikely to take extra time to include information not relevant to their goals. ''This gets to different contexts. Information gathered for one purpose is rarely useful for another unintended purpose.''

Notes: Didn't record accuracy of information entered - just completeness. A good lit review on the problems with performance measures.

Alternatives to Chart Abstraction
Alternatives to chart abstraction:
 * Standardized patients: Expensive and logistically intensive. May be useful for selective cases when precision is needed.
 * (Rethans et al. 1994; Norman et al. 1985; McLeod, Tamblyn, Gayton, et al. "Use of standardized patients..." JAMA 1997)
 * Written case scenarios: Inexpensive. Control for case-mix. Requires validation - what docs know doesn't necessarily translate to clinical practice. (Peabody et al. 2000).
 * Direct observation: Expensive. Prone to bias (aware of being observed). Tough Kappa.
 * The EMR can eliminate hand writing problems but introduces other types of errors. Only as good as the information recorded.
 * Questionaire:
 * (Katz et al. 1996 "Can comorbidity be measured by questionaire rather than medical record review? Med Care)
 * Peer assessment (see Goldman 1992)
 * Self report (Burns et al. 1992)

Goldman (1992) "The reliability of peer assessment of quality of care" JAMA. 267: 958-960

Burns, Moskowitz, Ash, et al. (1992) "Self-report versus medical record functional status" 30(suppl): MS85-89

Vu, Marcy, Colliver, et al. (1992) "Standardized (simulated) patients' accuracy in recording clinical performance check-list items" Med Educ. 26: 99-104

Ferrell (1995) "Clinical performance assessment using standardized patients: a primer. Special series: core concepts in family medicine education" Fam Med. 27: 14-19

Schwartz, Colliver (1997) "Using standardized patients for assessing clinical performance: An overview" Mt Sinai J Med. 63: 241-249

Beullens, Rethans, Goedhuys, Buntinx (1997) "The use of standardized patients in research in general practice" Fam Pract. 14: 58-62

Peabody et al. (2000) "Measure for measure: a prospective study comparing quality evaluation using vignettes, standardized patients, and chart abstraction" JAMA. 283: 1715-1722.