Ruddle, R. (2023) Using Well-Known Techniques to Visualize Characteristics of Data Quality. In: Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. 14th International Conference on Information Visualization Theory and Applications, 19-21 Feb 2023, Lisbon, Portugal. SCITEPRESS - Science and Technology Publications , pp. 89-100.
Abstract
Previous work has identified more than 100 distinct characteristics of data quality, most of which are aspects of completeness, accuracy and consistency. Other work has developed new techniques for visualizing data quality, but there is a lack of research into how users visualize data quality issues with existing, well-known techniques. We investigated how 166 participants identified and illustrated data quality issues that occurred in a 54-file, longitudinal collection of open data. The issues that participants identified spanned 27 different characteristics, nine of which do not appear in existing data quality taxonomies. Participants adopted nine visualization and tabular methods to illustrate the issues, using the methods in five ways (quantify; alert; examples; serendipitous discovery; explain). The variety of serendipitous discoveries was noteworthy, as was how rarely participants used visualization to illustrate completeness and consistency, compared with accuracy. We conclude by presenting a 106-item data quality taxonomy that combines seven previous works with our findings.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | Ⓒ 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0) |
Keywords: | Visualization; Data Quality; Data Science; Empirical Study |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Computation Science & Engineering |
Funding Information: | Funder Grant number Alan Turing Institute Not Known |
Depositing User: | Symplectic Publications |
Date Deposited: | 02 May 2024 14:26 |
Last Modified: | 02 May 2024 14:28 |
Status: | Published |
Publisher: | SCITEPRESS - Science and Technology Publications |
Identification Number: | 10.5220/0011664300003417 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:212184 |