View Only Articles , Only References , Everything

Saturday, October 25, 2008

Applying IDQ Principles of Research To The Bible

By applying principles of Information and Data Quality (IDQ) in research to the Bible, it can be shown that a high level of confidence in the accuracy and reliability of the information in the Bible is irrational, therefore arguments or claims using the bible as a premise are inherently weak.

Cross-check, Cross-check, Cross-check!
Accuracy and verifiability are part of the foundation of IDQ.

Researchers of Information and Data Quality (IDQ) have created classifications for Data Collectors, Data Custodians and Data Consumers. Those that collect the data provide it to those that store it and maintain it, and to those that use it. There are different values associated with IDQ dimensions depending on which categorical context it falls into(16). For example, the data custodian considers accuracy as the number one value while the consumer (depending on the context) may not consider accuracy the most important dimension. In all cases the most important criteria for the user is whether or not it is useful.

The fact that the consumer does not necessarily regard accuracy as the highest value creates a market for less accurate information which enterprising data producers are willing to satisfy. One example is the "tabloid" and "gossip magazine" industry. However, the desire for useful though inaccurate information extends across categories into business, marketing, politics and religion. Unfortunately, to ensure accurate data when needed, some extra work is necessary in the form of cross-checking.

Who is the author?
Like everything in life, cross-checking should be able to be used to verify a piece of information to see if it makes sense from another perspective. One way to do that is by being able to identify the author. When the author can be identified their credentials can be reviewed. Whether or not the author is an expert can be assessed, what their peers thought of them and what environment they lived in. These properties can be used to cross-check to see if the information has external consistency and makes sense from other perspectives. These properties allow the use of inference to assess the credibility, plausibility, believability and most importantly the accuracy of the information. There is no precise definition of accuracy, and in fact many of the dimensions of IDQ are self-referential, but it is the case that what accuracy is NOT is apparent and using that as a criteria, a working definition can be derived.

Accuracy implies that the datum represents a real world state.
It implies that when the data are reviewed, and compared to the real world event or object it describes the real world event sufficiently for more than one person to have as close to the same understanding of it as possible. An accurate representation of a real world event will not be ambiguous, will not lack precision and will not be incomplete because this will lead to inferences about the real world that do not or never existed or that represent an incorrect element in the real world(3).

Accurate and verifiable data are crucial to having enough understanding about the subject to be able to make reliable decisions, inferences and predictions in order to increase the likelihood of successful outcomes. Verifiability increases the credibility of information.

Your spouse, parents and reputable organizations endorse accurate reporting.
Almost everyone that has an interest in making some kind of an investment whether its monetary from a giant corporation or emotional from a trusting spouse desires, requires and demands IDQ. Human understanding and knowledge depend on it. Technology is successful because it builds on the accurate reporting and successful reproduction of work that came before it. Relationships are successful because Information Quality (also known as truth) fosters trust. Since Information Quality is so fundamental, it is easy to find reputable organizations that endorse it and not just your mother, father, spouse or friend.

Reputable organizations such as Cornell University(17), East Tennesee State University(19) and George Mason University(20) and McGraw Hill(21) and the U.S. Government(18) have websites set up which are devoted to promoting criteria for assesing the quality of information from sources. They place a high value on it and stress the importance of it. Two other websites related to education are "The Virtual Chase"(22) which is devoted to "teaching legal professionals how to do research", and Robert Harris's VirtualSalt(15) which is heavily referenced throughout the Internet. VirtalSalt has a checklist called "CARS" which was derived from the first letter of its major criteria, Credibility, Accuracy, Reasonableness and Support. The CARS Checklist encapsulates the research criteria that are endorsed by reputable organizations in an easy to remember mnemonic and can be found here

Criteria for Data and Information Quality in research
Listed below are the components of the CARS checklist. The initials of some of the other organizations listed above are used to show where their criteria fit into it. Their initials are beside the data quality dimension they endorse - vs is VirtualSalt, c is Cornell, vc is VirtualChase,

* Credibility (Credentials)
vs Author, c Author, vc Authority, c Publisher, c Title of Journal
Two relevant indicators of a lack of credibility are Anonymity and lack of quality control.

Critical Questions to ask are:
- Why should I trust this source?
- What is it that makes this source believable?
- How does this author know this information?
- Why is this source believable over any other?
- What are the authors credentials?
- What type of quality control did it undergo?
- Was it peer reviewed?

* Accuracy
vc Accuracy, vs Timeliness, vc Timliness, vs Comprehensiveness, c Coverage, vc Scope of Coverage, vs Audience and Purpose, c Intended Audience, c Edition or Revision, c Date of Publication,
Three relevant indicators of a lack of accuracy are no date for the document, vague or sweeping generalizations and biased to one point of view.

Critical Questions to ask are:
- Is it accurate? Is it correct?
- Is it up to date? Is it relevant?
- Is it Comprehensive? Does it leave anything out?
- What was the intended audience and purpose?

* Reasonableness
vs Fairness, vs Objectivity, vc Objectivity, c Objective Reasoning, vs Moderateness, vs Consistency, World View, - c Writing Style, vs consistency, vs world view
Some relevant indicators of a lack of reasonableness are intemperate tone or language, incredible claims, sweeping statements of excessive significance and inconsistency (written on the VirtualSalt as "conflict of interest")

Critical Questions to ask are:
- Does it offer a balanced, reasoned argument that is not selective or slanted?
- Is it biased?
- Is a reality check in order? Are the claims hard to believe? Are they likely, possible or probable?
- Does this conflict with what I know from my experience?
- Does it contradict itself?

* Support
vs [source documentation or bibliography], vs corroboration, vs External Consistency, c Evaluative Reviews
Some relevant indicators of a lack of support are numbers and statistics without a source, absence of source documentation and/or there are no other corroborative sources to be found.

Critical Questions to ask are:
- Where did this information come from? What sources did the author use?
- What support is given?
- Can this be cross-checked with at least two other independent sources?
- Is the information in the other independent sources consistent with this information?


What are some real world examples of poor Data and Information Quality research?
Conclusions about History are necessarily defeasible. One of the problems is that methodology and techniques improve a little every century. Conclusions made about a certain topic are revised as new information turns up. New information is used to compare to the old information for coherency and consitency. Some of these problems stem from poor data creation by the originator. Data are not accurate or complete. Users still struggle with these problems today. "A Website Dedicated to Information/Data Quality Disasters from Around the World" has been set up by the International Association for Information and Data Quality (IAIDQ) and its called IQ Trainwrecks(14 ). "Poor data quality can have a severe impact on the overall effectiveness of an organization"(3) and "Poor data quality can have substantial social and economic impacts"(11) that span the spectrum from news to marketing to text books to health care. Fortunately we can examine the methods of the ancient historians and scientists to see what led to poor results so that we can avoid those methods, improve what can be improved and derive new ones to replace the old.

Applying Data and Information Quality for research to the Bible.
As accurate as they tried to be, the authors of scripture still suffered from the same sorts of problems common with ancient historians and scientists. They were biased, inaccurate, had no way to verify information, depended on second or third hand information from relatively uneducated people, were influenced by political affiliations and commissions from aristocrats and state leaders and had poor tools to work with.

The Authors of the bible do not do any better job than their historian and scientific peers in documenting the world. In fact, of the three categories, scientists fared somewhat better because of their quality of documentation. The Library in Alexandria was destroyed by fire over time, so much of ancient scholarship and science was lost but some of the works that do remain leave little doubt about how to reproduce their experiments or their authorship.

It used to be believed that every author of every book in the bible could be identified but over time, it has come to be recognized that tradition is a poor way to record who authored what. External verification of the data revealed how unlikely it was that the person traditionally believed to be the author actually was or even existed.

According to several sources "The Bible comprises 24 books for Jews, 66 for Protestants, 73 for Catholics, and 78 for most Orthodox Christians." (24) From others: "The Protetant Bible contains 66 books (39 OT, 27 NT); the Catholic Bible contains 73 books (46 OT, 27 NT); the Eastern Orthodox Bible contains 78 books (51 OT, 27 NT). The Hebrew Bible (the name of the OT by Jews) contains only 24 books.(23)

Most of the authors of the original information about the Abrahamic God are unknown
There are different books in the bible depending on if you use the Hebrew, the protestant, the catholic or the orthodox (for example) If we use the greatest number of books in any bible as our total, then there are only about 21% of them where the author can be identified. 79% percent of them are unknown(24). 79% percent of the original information that exists about the abrahamic god comes from unknown sources. One of the indicators for lack of credibility in a work is anonymity(15). A small percentage of scripture are not considered worthy of inclusion between denominations. What makes one worthy to one group and not worthy to another? Lack of credibility is one criteria that comes to mind.

The bible is an amalgum of scriptures that span years. Some of the scriptures seem to be derived from other scriptures most of which were also included in the Bible. Trying to use the criteria for varied sources for cross-checking with the Bible is difficult because they were derived from each other, a large portion of the authors are unknown and the quality of production was poor. The criteria used to put them together is not clear but a presumption at a minimum of a need for coherency and consistency is warranted.

The word "trust" is used liberally to describe IDQ criteria. While the bible is generally considered to be trustworthy, is it really? What is it about something that make it "trustworthy"? Accuracy? Coherency and consistency with what we know from our experience?

What follows is a summary of principled research criteria standards which the Bible does not meet with some generic examples.
For the sake of brevity I did not include many solid examples but I do welcome audience participation by documenting them in the comments.

* Authorship - Traditional authorship have been overturned by later scholarship
* Not up to date - Leviticus and Deuteronomy in the OT, Pauls bias against women in the NT
* Inaccurate, incorrect - The rivers of Eden in the OT, Inconsistencies between the gospels
* Irrelevant - Leviticus and Deuteronomy in the OT, ambiguous NT fallacy apparently contradictory anyway "Whoever is not against us is for us — Mark 9:40" vs "He who is not with me is against me — Matthew 12:30a"
* Bias - Old testament treatment of worshipers of other gods, NT treatment of Jewish leadership and scholars.
* Unlikely - Most of the OT and in NT Jesus sternly rebuked his disciples for sleeping in the garden of gesthemane so who witnessed it?
* Conflicts with knowledge obtained from our experiences - Magicians do water to wine tricks.
* Contradicts itself - Who discovered the empty tomb?
* Cross-checking with external sources is extremely difficult and does not support to a large degree. There is no verifiable eyewitness account of the existence of Jesus, however that does not mean he did not exist.

Robert Harris's VirtualSalt has a checklist with a mnemonic for how to deal with information.

Living with Information: The CAFÉ Advice from VirtualSalt(15)
Challenge
Challenge information the information with critical questions and expect accountability.

Adapt
Adapt your requirements for information quality to match the importance of the information and what is being claimed. Extraordinary claims warrant extraordinary evidence.

File
File new information in your mind rather than immediately reaching a conclusion. Turn your conclusion into a question. Gather more information until there is little room for doubt.

Evaluate
Evaluate and re-evaluate regularly. New information or changing circumstances will affect the accuracy and the evaluation of previous information.

I will sum it up in a word.
Cross-check, Cross-check, Cross-check.

REFERENCES AND FURTHER READING
1. Wikipedia, "Data Management"
2. Information Quality at MIT
3. Anchoring Data Quality Dimensions in Ontological Foundations
4. DMReview, Data Management Review
5. IQ-1 Certificate Program
6. Wikipedia, 2003 Invasion of Iraq
7. How Accurate Is The Bible?
8. Datalever.com
9. Wikipedia, Tanakh
10. Null Hypothesis
11. Beyond Accuracy: What Data Quality Means To Consumers
12. IQ Benchmarks
13. Reasonable Doubt About Adaption Theory
14. IQ Trainwrecks
15. Robert Harris' VirtualSalt
16. Data Quality Assessment
17. Cornell University Library
18. Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility and Integrity of Information Disseminated by Federal Agnecies
19. East Tennesee State University Researchers Toolbox
20. George Mason Univeristy
21. McGraw-Hill Higher Education, Evaluating Internet Resources
22. The Virtual Chase, Criteria for Quality in Information--Checklist
23. Know Your Bible
24. Wikipedia, Authors of The Bible
25. Ancient HistoriansPart 1, Part 2
Email this article

Friday, October 24, 2008

Applying IDQ Principles of Research To The Bible

By applying principles of Information and Data Quality (IDQ) in research to the Bible, it can be shown that a high level of confidence in the accuracy and reliability of the information in the Bible is irrational, therefore arguments or claims using the bible as a premise are inherently weak.

Cross-check, Cross-check, Cross-check!
Accuracy and verifiability are part of the foundation of IDQ.

Researchers of Information and Data Quality (IDQ) have created classifications for Data Collectors, Data Custodians and Data Consumers. Those that collect the data provide it to those that store it and maintain it, and to those that use it. There are different values associated with IDQ dimensions depending on which categorical context it falls into(16). For example, the data custodian considers accuracy as the number one value while the consumer (depending on the context) may not consider accuracy the most important dimension. In all cases the most important criteria for the user is whether or not it is useful.

The fact that the consumer does not necessarily regard accuracy as the highest value creates a market for less accurate information which enterprising data producers are willing to satisfy. One example is the "tabloid" and "gossip magazine" industry. However, the desire for useful though inaccurate information extends across categories into business, marketing, politics and religion. Unfortunately, to ensure accurate data when needed, some extra work is necessary in the form of cross-checking.

Who is the author?
Like everything in life, cross-checking should be able to be used to verify a piece of information to see if it makes sense from another perspective. One way to do that is by being able to identify the author. When the author can be identified their credentials can be reviewed. Whether or not the author is an expert can be assessed, what their peers thought of them and what environment they lived in. These properties can be used to cross-check to see if the information has external consistency and makes sense from other perspectives. These properties allow the use of inference to assess the credibility, plausibility, believability and most importantly the accuracy of the information. There is no precise definition of accuracy, and in fact many of the dimensions of IDQ are self-referential, but it is the case that what accuracy is NOT is apparent and using that as a criteria, a working definition can be derived.

Accuracy implies that the datum represents a real world state.
It implies that when the data are reviewed, and compared to the real world event or object it describes the real world event sufficiently for more than one person to have as close to the same understanding of it as possible. An accurate representation of a real world event will not be ambiguous, will not lack precision and will not be incomplete because this will lead to inferences about the real world that do not or never existed or that represent an incorrect element in the real world(3).

Accurate and verifiable data are crucial to having enough understanding about the subject to be able to make reliable decisions, inferences and predictions in order to increase the likelihood of successful outcomes. Verifiability increases the credibility of information.

Your spouse, parents and reputable organizations endorse accurate reporting.
Almost everyone that has an interest in making some kind of an investment whether its monetary from a giant corporation or emotional from a trusting spouse desires, requires and demands IDQ. Human understanding and knowledge depend on it. Technology is successful because it builds on the accurate reporting and successful reproduction of work that came before it. Relationships are successful because Information Quality (also known as truth) fosters trust. Since Information Quality is so fundamental, it is easy to find reputable organizations that endorse it and not just your mother, father, spouse or friend.

Reputable organizations such as Cornell University(17), East Tennesee State University(19) and George Mason University(20) and McGraw Hill(21) and the U.S. Government(18) have websites set up which are devoted to promoting criteria for assesing the quality of information from sources. They place a high value on it and stress the importance of it. Two other websites related to education are "The Virtual Chase"(22) which is devoted to "teaching legal professionals how to do research", and Robert Harris's VirtualSalt(15) which is heavily referenced throughout the Internet. VirtalSalt has a checklist called "CARS" which was derived from the first letter of its major criteria, Credibility, Accuracy, Reasonableness and Support. The CARS Checklist encapsulates the research criteria that are endorsed by reputable organizations in an easy to remember mnemonic and can be found here

Criteria for Data and Information Quality in research
Listed below are the components of the CARS checklist. The initials of some of the other organizations listed above are used to show where their criteria fit into it. Their initials are beside the data quality dimension they endorse - vs is VirtualSalt, c is Cornell, vc is VirtualChase,

* Credibility (Credentials)
vs Author, c Author, vc Authority, c Publisher, c Title of Journal
Two relevant indicators of a lack of credibility are Anonymity and lack of quality control.

Critical Questions to ask are:
- Why should I trust this source?
- What is it that makes this source believable?
- How does this author know this information?
- Why is this source believable over any other?
- What are the authors credentials?
- What type of quality control did it undergo?
- Was it peer reviewed?

* Accuracy
vc Accuracy, vs Timeliness, vc Timliness, vs Comprehensiveness, c Coverage, vc Scope of Coverage, vs Audience and Purpose, c Intended Audience, c Edition or Revision, c Date of Publication,
Three relevant indicators of a lack of accuracy are no date for the document, vague or sweeping generalizations and biased to one point of view.

Critical Questions to ask are:
- Is it accurate? Is it correct?
- Is it up to date? Is it relevant?
- Is it Comprehensive? Does it leave anything out?
- What was the intended audience and purpose?

* Reasonableness
vs Fairness, vs Objectivity, vc Objectivity, c Objective Reasoning, vs Moderateness, vs Consistency, World View, - c Writing Style, vs consistency, vs world view
Some relevant indicators of a lack of reasonableness are intemperate tone or language, incredible claims, sweeping statements of excessive significance and inconsistency (written on the VirtualSalt as "conflict of interest")

Critical Questions to ask are:
- Does it offer a balanced, reasoned argument that is not selective or slanted?
- Is it biased?
- Is a reality check in order? Are the claims hard to believe? Are they likely, possible or probable?
- Does this conflict with what I know from my experience?
- Does it contradict itself?

* Support
vs [source documentation or bibliography], vs corroboration, vs External Consistency, c Evaluative Reviews
Some relevant indicators of a lack of support are numbers and statistics without a source, absence of source documentation and/or there are no other corroborative sources to be found.

Critical Questions to ask are:
- Where did this information come from? What sources did the author use?
- What support is given?
- Can this be cross-checked with at least two other independent sources?
- Is the information in the other independent sources consistent with this information?


What are some real world examples of poor Data and Information Quality research?
Conclusions about History are necessarily defeasible. One of the problems is that methodology and techniques improve a little every century. Conclusions made about a certain topic are revised as new information turns up. New information is used to compare to the old information for coherency and consitency. Some of these problems stem from poor data creation by the originator. Data are not accurate or complete. Users still struggle with these problems today. "A Website Dedicated to Information/Data Quality Disasters from Around the World" has been set up by the International Association for Information and Data Quality (IAIDQ) and its called IQ Trainwrecks(14 ). "Poor data quality can have a severe impact on the overall effectiveness of an organization"(3) and "Poor data quality can have substantial social and economic impacts"(11) that span the spectrum from news to marketing to text books to health care. Fortunately we can examine the methods of the ancient historians and scientists to see what led to poor results so that we can avoid those methods, improve what can be improved and derive new ones to replace the old.

Applying Data and Information Quality for research to the Bible.
As accurate as they tried to be, the authors of scripture still suffered from the same sorts of problems common with ancient historians and scientists. They were biased, inaccurate, had no way to verify information, depended on second or third hand information from relatively uneducated people, were influenced by political affiliations and commissions from aristocrats and state leaders and had poor tools to work with.

The Authors of the bible do not do any better job than their historian and scientific peers in documenting the world. In fact, of the three categories, scientists fared somewhat better because of their quality of documentation. The Library in Alexandria was destroyed by fire over time so much of ancient scholarship and science was lost but some of the works that do remain leave little doubt about how to reproduce their experiments or their authorship.

It used to be believed that every author of every book in the bible could be identified but over time, it has come to be recognized that tradition is a poor way to record who authored what. External verification of the data revealed how unlikely it was that the person traditionally believed to be the author actually was or even existed.

According to several sources "The Bible comprises 24 books for Jews, 66 for Protestants, 73 for Catholics, and 78 for most Orthodox Christians." (wikipedia) From others: "The Protetant Bible contains 66 books (39 OT, 27 NT); the Catholic Bible contains 73 books (46 OT, 27 NT); the Eastern Orthodox Bible contains 78 books (51 OT, 27 NT). The Hebrew Bible (the name of the OT by Jews) contains only 24 books.(23)

Most of the authors of the original information about the Abrahamic God are unknown
There are different books in the bible depending on if you use the Hebrew, the protestant, the catholic or the orthodox (for example) If we use the greatest number of books in any bible as our total, then there are only about 21% of them where the author can be identified. 79% percent of them are unknown(24). 79% percent of the original information that exists about the abrahamic god comes from unknown sources. One of the indicators for lack of credibility in a work is anonymity(15). A small percentage of scripture are not considered worthy of inclusion between denominations. What makes one worthy to one group and not worthy to another? Lack of credibility is one criteria that comes to mind.

The bible is an amalgum of scriptures that span years. Some of the scriptures seem to be derived from other scriptures most of which were also included in the Bible. Trying to use the criteria for varied sources for cross-checking with the Bible is difficult because they were derived from each other, a large portion of the authors are unknown and the quality of production was poor. The criteria used to put them together is not clear but a presumption at a minimum of a need for coherency and consistency is warranted.

The word "trust" is used liberally to describe IDQ criteria. While the bible is generally considered to be trustworthy, is it really? What is it about something that make it "trustworthy"? Accuracy? Coherency and consistency with what we know from our experience?

What follows is a summary of principled research criteria standards which the Bible does not meet with some generic examples.
For the sake of brevity I did not include many solid examples but I do welcome audience participation by documenting them in the comments.

* Authorship - Traditional authorship have been overturned by later scholarship
* Not up to date - Leviticus and Deuteronomy in the OT, Pauls bias against women in the NT
* Inaccurate, incorrect - The rivers of Eden in the OT, Inconsistencies between the gospels
* Irrelevant - Leviticus and Deuteronomy in the OT, ambiguous NT fallacy apparently contradictory anyway "Whoever is not against us is for us — Mark 9:40" vs "He who is not with me is against me — Matthew 12:30a"
* Bias - Old testament treatment of worshipers of other gods, NT treatment of Jewish leadership and scholars.
* Unlikely - Most of the OT and in NT Jesus sternly rebuked his disciples for sleeping in the garden of gesthemane so who witnessed it?
* Conflicts with knowledge obtained from our experiences - Magicians do water to wine tricks.
* Contradicts itself - Who discovered the empty tomb?
* Cross-checking with external sources is extremely difficult and does not support to a large degree. There is no verifiable eyewitness account of the existence of Jesus, however that does not mean he did not exist.

Robert Harris's VirtualSalt has a checklist with a mnemonic for how to deal with information.

Living with Information: The CAFÉ Advice from VirtualSalt(15)
Challenge
Challenge information the information with critical questions and expect accountability.

Adapt
Adapt your requirements for information quality to match the importance of the information and what is being claimed. Extraordinary claims warrant extraordinary evidence.

File
File new information in your mind rather than immediately reaching a conclusion. Turn your conclusion into a question. Gather more information until there is little room for doubt.

Evaluate
Evaluate and re-evaluate regularly. New information or changing circumstances will affect the accuracy and the evaluation of previous information.

I will sum it up in a word.
Cross-check, Cross-check, Cross-check.

REFERENCES AND FURTHER READING
1. Wikipedia, "Data Management"
2. Information Quality at MIT
3. Anchoring Data Quality Dimensions in Ontological Foundations
4. DMReview, Data Management Review
5. IQ-1 Certificate Program
6. Wikipedia, 2003 Invasion of Iraq
7. How Accurate Is The Bible?
8. Datalever.com
9. Wikipedia, Tanakh
10. Null Hypothesis
11. Beyond Accuracy: What Data Quality Means To Consumers
12. IQ Benchmarks
13. Reasonable Doubt About Adaption Theory
14. IQ Trainwrecks
15. Robert Harris' VirtualSalt
16. Data Quality Assessment
17. Cornell University Library
18. Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility and Integrity of Information Disseminated by Federal Agnecies
19. East Tennesee State University Researchers Toolbox
20. George Mason Univeristy
21. McGraw-Hill Higher Education, Evaluating Internet Resources
22. The Virtual Chase, Criteria for Quality in Information--Checklist
23. Know Your Bible
24. Wikipedia, Authors of The Bible
25. Ancient HistoriansPart 1, Part 2
Email this article

Monday, October 13, 2008

Applying Data and Information Quality Concepts to the Bible

In the coming weeks, I intend to show that if we posit a null hypothesis about the Bible and we evaluate the quality of data and information in the Bible, the hypothesis that humans alone were sufficient to create the Bible is supported very well by the Data which effectively refutes the hypothesis posited in 2 Timothy 3:16.

Brief Introduction to Data and Information Quality
I recommend reading the following link on Wikipedia, Data Quality. Its a good overview of how Data and Information Quality got its start as an aspect of computer science.

Data and Information have an intrinsic value
While historically, the desire for accurate information has always been important, especially to Kings and Generals, the perceived need for principles to manage data quality arose from the realization of businesses that databases which accurately reflected the state of the world, namely customer information and inventories, saved money. Over the years, as computing became less expensive the technology was adopted by individual consumers and the amount of information available online grew from diverse sources such as companies, governments and individuals. It became apparent that some way to evaluate the quality of information was needed(1). It should be obvious that some data is accurate and reliable and some data is not. To ensure data is accurate and reliable, it needs to be profiled, cleaned, parsed, matched, moved, analyzed, reconciled and reported on(8). In the past two decades metrics for determining the relative quality of information from a given source have been derived. Measuring the quality of an information source is an inexact science but using principles of probability, its relative quality can be measured(12).

Data Quality Dimensions
Data Quality is a term used to describe characteristics or dimensions attributed to data or information. Much of the research on Data Quality is carried out at The MIT Total Data Quality Management Program where Richard Y. Wang has led the effort since the 1990's. There are several approaches to data quality research that depend on how the data will be used, and they all have their own values for criteria or "dimensions". The approaches can be categorized as "Intuitive" (based on what the researcher believed is important), "Theoretical" (how data becomes deficient during the production) and "Empirical" (data gathered from consumers to see what is important to them). Most data studies fall into the "Intuitive" category, however they all contain a core set of "dimensions" and one data dimension that has a consistently high value in all lists is "Accuracy". Another highly valued core dimension from the intuitive approach is "reliability". Some highly valued core dimensions from the Theoretical approach are "Accuracy, Relevance, Correctness, Currency, Completeness" and from the Empirical approach are "Accuracy, Relevancy, Believability, Valued-added, Interpretability" and "Ease-of-understanding" (11). The different dimensions will have higher and lower values to different organizations depending on the context in which they are used. I will elaborate more on the data production and the data consumer dimensions as I explore how they apply to the Bible in later articles.

Do you think data and information quality important?
Would you be satisfied with a metaphorical record in the following situations or would you prefer a record that accurately represents real world events?
- Reading or watching the news
- Textbooks that you are required to purchase for your University courses
- Studying the only record of the Abrahamic God that exists.
- Producing or reading a business report
- Grocery shopping
- Reviewing your bank statement
- Reviewing the charges for your utilities, such as electric, phone, water, trash, television etc
- Paying your Taxes
- Purchasing a car
- Taking inventory
- Purchasing insurance
- Reviewing your shipping invoice, what you received versus what your ordered and how much you paid.
- Your check at the restaurant
and so on.

Why is data and information quality important?
So what happens when data and information quality is poor? "Poor data quality can have a severe impact on the overall effectiveness of an organization"(3) and "Poor data quality can have substantial social and economic impacts"(11). Subsequently there is a high value placed on information quality as evidenced by how much people are willing to spend to obtain it. There is an industry built on data quality concepts(4) and professional certifications available(5). The reliability of such things as inventory, medical records, medical research, military and civilian logistics, market research, consumer safety, education, consultant reports, work requests, billing reports, status reports, technical manuals and intelligence reports depend on data from verifiable sources that are produced with the goal of accurately representing elements of the real world. One recent example of what happens when there is poor quality information and data is the decision by the United States to invade Iraq in 2003 on the grounds that Iraq possessed "Weapons of Mass Destruction"(6) which turned out to be false. Because of the demonstrable importance of assessing data quality, the industry of Data Quality Management has developed(4).

Who uses Data Quality and Information Quality Dimensions?
Short list of organizations promoting Information Quality Principles
* US Government,
- Data Quality Act,
- Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies
* Data Quality Management Industry, DMReview, an industry magazine.
* Education Professionals,
- The Quality Information Checklist,
- Robert Harris's "VirtualSalt",
- East Tennessee State University
* Legal Industry, Evaluating the Quality of Information on the Internet
* Medical Industry,
- Journal of Medical Internet Research,
- Medical Billing
* US Army Logistics, "Data Quality Problems in Army Logistics", By Lionel A. Galway, Christopher H. Hanks, United States Army, Rand Corporation, Arroyo Center
and many more.

The Book As A Database: The justification to apply data and information quality metrics to evaluate the Bible
A book can be a data source. It can be treated like a database. It can be profiled, cleaned, parsed, matched, moved, analyzed, reconciled and reported on. Examples are an atlas, a history book, generally speaking a text book and The Bible. In fact, over the years, to facilitate ease of study, the Bible has been formatted and cross-referenced very similarly to a database.

If we have a lot of individual pieces of information sources we can collect them, profile them, sort them, categorize them, spell check them, look for exceptions, reconcile them, clean them, parse them, match them, move them, and create a report about them. Then they can be put together into an anthology. Once they are into an anthology, they can be further organized into volumes, chapters, pages, paragraphs, sentences, and if necessary even further still into parts of sentences (to separate two distinct ideas in one sentence for example) and verses. This is what happened to the Bible.

Over centuries early Jewish religious leaders initiated the transcription of oral tradition, then later accumulated individual pieces of scripture, evaluated them and combined them into the Tanakh. Generation after Generation went to great effort to maintain the integrity and quality of the Bible by attempting to ensure, at least in theory, that it remained unchanged during copying. When Christianity had generated their own scriptures, and translated the Tahakh from Hebrew, a similar process happened. In the 13th century Stephen Langton of Magna Carta fame created the chapter and verse system later adopted by Jews during the harsh persecution of the Spanish Inquisition(9) and widely in use today in modern Bibles. Obviously the Bible was considered and treated as a source of information about real events in the world whose integrity and quality were given a very high priority and importance.

So how accurate should we expect the Word of God to be?
In the Bible 2 Timothy 3:16 says that "All Scripture is inspired by God and profitable for teaching, for reproof, for correction, for training in righteousness". Jesus describes himself as "the way" and goes on to further describe himself as a kind of "Model" to show what God is like. Later, in 325 CE, Church Fathers formally adopted a creed which described him as being "one substance" with God. Jesus confirmed the Old Testament was the word of God by referring to it as such and referred back to it frequently. If Jesus was God incarnate, he verified that Scripture was his word. He mapped Scripture to God and to Himself and verified that Scripture mapped to real world events. Therefore we should expect some measurable difference between scripture and a book not inspired or endorsed by God .

If we use a weighted raking we can get a rough idea of how accurate we can expect the Word of God to be. God is perfect, and man is not. So we can expect that man will be less accurate than God, but if God is helping man, then man should be more accurate than if he were working alone.

1. Man alone is less accurate
2. Man is more accurate with Gods help than without it
3. God is more accurate than man

That should serve as a rough guideline and the first metric in an attempt to quantify the accuracy of the Bible(7).

The following is a list of human endeavors that apparently were not divinely inspired, so when using the weighted ranking scale in evaluating how the Bible compares to human endeavors it should be reasonable to expect the following.
- It should be at least as brilliant as the ancient theories of knowledge, reason, truth, nature, mathematics, logic, knowledge of nature, and the use of mathematics to describe nature which continue to inform the practice of science to the present day resulting in theories such as Germ theory, Relativity, Genetics, Atoms, Quantum Theory all of which have been applied to generally reduce the amount of suffering in the world.
- It should at least be as accurate as a history book where it talks about history
- It should at least be as accurate as a science book where it talks about the world
- It should at least be as accurate as a manual where it gives instructions
- It should at least be as accurate as a scientific theory where it gives predictions
If not, then there is no reason to think that its inspiration is anything different than any other type of inspiration.

A Null hypothesis is any hypothesis that is evaluated for its ability to explain a given set of data. If the hypothesis is not sufficient to explain the data, then there is reason to pursue an alternate hypothesis. While it is not without it criticisms, particularly compared to Bayesian Inference(10), it is a useful heuristic to form an initial opinion about an idea about its probability or plausibility, or to get a "feeling" about something.

In the coming weeks, I intend to show that if we posit a null hypothesis about the Bible and we evaluate the quality of data and information in the Bible, the hypothesis that humans alone were sufficient to create the Bible is supported very well by the Data which effectively refutes the hypothesis posited in 2 Timothy 3:16.

REFERENCES
1. Wikipedia, "Data Management"
2. Information Quality at MIT
3. Anchoring Data Quality Dimensions in Ontological Foundations
4. DMReview, Data Management Review
5. IQ-1 Certificate Program
6. Wikipedia, 2003 Invasion of Iraq
7. How Accurate Is The Bible?
8. Datalever.com
9. Wikipedia, Tanakh
10. Wikipedia, Null Hypothesis
11. Beyond Accuracy: What Data Quality Means To Consumers
12. IQ Benchmarks
Email this article

Sunday, October 12, 2008

Applying Data and Information Quality Concepts to the Bible

In the coming weeks, I intend to show that if we posit a null hypothesis about the Bible and we evaluate the quality of data and information in the Bible, the hypothesis that humans alone were sufficient to create the Bible is supported very well by the Data which effectively refutes the hypothesis posited in 2 Timothy 3:16.

Brief Introduction to Data and Information Quality
I recommend reading the following link on Wikipedia, Data Quality. Its a good overview of how Data and Information Quality got its start as an aspect of computer science.

Data and Information have an intrinsic value
While historically, the desire for accurate information has always been important, especially to Kings and Generals, the perceived need for principles to manage data quality arose from the realization of businesses that databases which accurately reflected the state of the world, namely customer information and inventories, saved money. Over the years, as computing became less expensive the technology was adopted by individual consumers and the amount of information available online grew from diverse sources such as companies, governments and individuals. It became apparent that some way to evaluate the quality of information was needed(1). It should be obvious that some data is accurate and reliable and some data is not. To ensure data is accurate and reliable, it needs to be profiled, cleaned, parsed, matched, moved, analyzed, reconciled and reported on(8). In the past two decades metrics for determining the relative quality of information from a given source have been derived. Measuring the quality of an information source is an inexact science but using principles of probability, its relative quality can be measured(12).

Data Quality Dimensions
Data Quality is a term used to describe characteristics or dimensions attributed to data or information. Much of the research on Data Quality is carried out at The MIT Total Data Quality Management Program where Richard Y. Wang has led the effort since the 1990's. There are several approaches to data quality research that depend on how the data will be used, and they all have their own values for criteria or "dimensions". The approaches can be categorized as "Intuitive" (based on what the researcher believed is important), "Theoretical" (how data becomes deficient during the production) and "Empirical" (data gathered from consumers to see what is important to them). Most data studies fall into the "Intuitive" category, however they all contain a core set of "dimensions" and one data dimension that has a consistently high value in all lists is "Accuracy". Another highly valued core dimension from the intuitive approach is "reliability". Some highly valued core dimensions from the Theoretical approach are "Accuracy, Relevance, Correctness, Currency, Completeness" and from the Empirical approach are "Accuracy, Relevancy, Believability, Valued-added, Interpretability" and "Ease-of-understanding" (11). The different dimensions will have higher and lower values to different organizations depending on the context in which they are used. I will elaborate more on the data production and the data consumer dimensions as I explore how they apply to the Bible in later articles.

Do you think data and information quality important?
Would you be satisfied with a metaphorical record in the following situations or would you prefer a record that accurately represents real world events?
- Reading or watching the news
- Textbooks that you are required to purchase for your University courses
- Studying the only record of the Abrahamic God that exists.
- Producing or reading a business report
- Grocery shopping
- Reviewing your bank statement
- Reviewing the charges for your utilities, such as electric, phone, water, trash, television etc
- Paying your Taxes
- Purchasing a car
- Taking inventory
- Purchasing insurance
- Reviewing your shipping invoice, what you received versus what your ordered and how much you paid.
- Your check at the restaurant
and so on.

Why is data and information quality important?
So what happens when data and information quality is poor? "Poor data quality can have a severe impact on the overall effectiveness of an organization"(3) and "Poor data quality can have substantial social and economic impacts"(11). Subsequently there is a high value placed on information quality as evidenced by how much people are willing to spend to obtain it. There is an industry built on data quality concepts(4) and professional certifications available(5). The reliability of such things as inventory, medical records, medical research, military and civilian logistics, market research, consumer safety, education, consultant reports, work requests, billing reports, status reports, technical manuals and intelligence reports depend on data from verifiable sources that are produced with the goal of accurately representing elements of the real world. One recent example of what happens when there is poor quality information and data is the decision by the United States to invade Iraq in 2003 on the grounds that Iraq possessed "Weapons of Mass Destruction"(6) which turned out to be false. Because of the demonstrable importance of assessing data quality, the industry of Data Quality Management has developed(4).

Who uses Data Quality and Information Quality Dimensions?
Short list of organizations promoting Information Quality Principles
* US Government,
- Data Quality Act,
- Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies
* Data Quality Management Industry, DMReview, an industry magazine.
* Education Professionals,
- The Quality Information Checklist,
- Robert Harris's "VirtualSalt",
- East Tennessee State University
* Legal Industry, Evaluating the Quality of Information on the Internet
* Medical Industry,
- Journal of Medical Internet Research,
- Medical Billing
* US Army Logistics, "Data Quality Problems in Army Logistics", By Lionel A. Galway, Christopher H. Hanks, United States Army, Rand Corporation, Arroyo Center
and many more.

The Book As A Database: The justification to apply data and information quality metrics to evaluate the Bible
A book can be a data source. It can be treated like a database. It can be profiled, cleaned, parsed, matched, moved, analyzed, reconciled and reported on. Examples are an atlas, a history book, generally speaking a text book and The Bible. In fact, over the years, to facilitate ease of study, the Bible has been formatted and cross-referenced very similarly to a database.

If we have a lot of individual pieces of information sources we can collect them, profile them, sort them, categorize them, spell check them, look for exceptions, reconcile them, clean them, parse them, match them, move them, and create a report about them. Then they can be put together into an anthology. Once they are into an anthology, they can be further organized into volumes, chapters, pages, paragraphs, sentences, and if necessary even further still into parts of sentences (to separate two distinct ideas in one sentence for example) and verses. This is what happened to the Bible.

Over centuries early Jewish religious leaders initiated the transcription of oral tradition, then later accumulated individual pieces of scripture, evaluated them and combined them into the Tanakh. Generation after Generation went to great effort to maintain the integrity and quality of the Bible by attempting to ensure, at least in theory, that it remained unchanged during copying. When Christianity had generated their own scriptures, and translated the Tahakh from Hebrew, a similar process happened. In the 13th century Stephen Langton of Magna Carta fame created the chapter and verse system later adopted by Jews during the harsh persecution of the Spanish Inquisition(9) and widely in use today in modern Bibles. Obviously the Bible was considered and treated as a source of information about real events in the world whose integrity and quality were given a very high priority and importance.

So how accurate should we expect the Word of God to be?
In the Bible 2 Timothy 3:16 says that "All Scripture is inspired by God and profitable for teaching, for reproof, for correction, for training in righteousness". Jesus describes himself as "the way" and goes on to further describe himself as a kind of "Model" to show what God is like. Later, in 325 CE, Church Fathers formally adopted a creed which described him as being "one substance" with God. Jesus confirmed the Old Testament was the word of God by referring to it as such and referred back to it frequently. If Jesus was God incarnate, he verified that Scripture was his word. He mapped Scripture to God and to Himself and verified that Scripture mapped to real world events. Therefore we should expect some measurable difference between scripture and a book not inspired or endorsed by God .

If we use a weighted raking we can get a rough idea of how accurate we can expect the Word of God to be. God is perfect, and man is not. So we can expect that man will be less accurate than God, but if God is helping man, then man should be more accurate than if he were working alone.

1. Man alone is less accurate
2. Man is more accurate with Gods help than without it
3. God is more accurate than man

That should serve as a rough guideline and the first metric in an attempt to quantify the accuracy of the Bible(7).

The following is a list of human endeavors that apparently were not divinely inspired, so when using the weighted ranking scale in evaluating how the Bible compares to human endeavors it should be reasonable to expect the following.
- It should be at least as brilliant as the ancient theories of knowledge, reason, truth, nature, mathematics, logic, knowledge of nature, and the use of mathematics to describe nature which continue to inform the practice of science to the present day resulting in theories such as Germ theory, Relativity, Genetics, Atoms, Quantum Theory all of which have been applied to generally reduce the amount of suffering in the world.
- It should at least be as accurate as a history book where it talks about history
- It should at least be as accurate as a science book where it talks about the world
- It should at least be as accurate as a manual where it gives instructions
- It should at least be as accurate as a scientific theory where it gives predictions
If not, then there is no reason to think that its inspiration is anything different than any other type of inspiration.

A Null hypothesis is any hypothesis that is evaluated for its ability to explain a given set of data. If the hypothesis is not sufficient to explain the data, then there is reason to pursue an alternate hypothesis. While it is not without it criticisms, particularly compared to Bayesian Inference(10), it is a useful heuristic to form an initial opinion about an idea about its probability or plausibility, or to get a "feeling" about something.

In the coming weeks, I intend to show that if we posit a null hypothesis about the Bible and we evaluate the quality of data and information in the Bible, the hypothesis that humans alone were sufficient to create the Bible is supported very well by the Data which effectively refutes the hypothesis posited in 2 Timothy 3:16.

REFERENCES
1. Wikipedia, "Data Management"
2. Information Quality at MIT
3. Anchoring Data Quality Dimensions in Ontological Foundations
4. DMReview, Data Management Review
5. IQ-1 Certificate Program
6. Wikipedia, 2003 Invasion of Iraq
7. How Accurate Is The Bible?
8. Datalever.com
9. Wikipedia, Tanakh
10. Wikipedia, Null Hypothesis
11. Beyond Accuracy: What Data Quality Means To Consumers
12. IQ Benchmarks

Just For Fun
Tomorrows History, A snapshot of whats going on in the world today.
- Jacob Zuma of South Africa still a mystery
- Pirates Seize Tanker Off African Coast
- Kim Jong-il Health: Still A Puzzle
- Three Jewish rioters hurt, Arab home set ablaze in latest Acre ...
- Terrorist Suicide attacks a growing threat in Pakistan
- Bomb in Pakistan capital, air strikes on militants
- How Malaysia's PM fell from grace
- Sri Lankan troops kill 18 rebels
- Bush to sign landmark US-India nuclear legislation
- Will the Afghan Taleban join peace talks?
- Japan objects to US N Korea move
- IMF chief hails 'first' global coordination on financial crisis
- The global economy players: Which organization does what?
- McCain supporters face uphill climb in blue Jersey
- Maine now on McCain radar
- McCain inches closer in Saturday's Gallup
- Concern in GOP After Rough Week for McCain
- Florida Republicans cast blame as McCain trails in polls
- On the trail to the White House
- Obama Thanks McCain for Urging Supporters to be Respectful
Email this article