Since vega-lite-linter requires Clingo as the solver of Answer Set Programming, you are required to install it first.
For Linux users:
apt-get install -y gringo
For MacOs users:
brew install clingo
Or using Conda:
conda install -c potassco clingo
More information for downloading Clingo can be found here.
Vega-lite-linter is built on Python 3 and can be installed by:
pip install vega-lite-linter
After successfully installing Clingo and vega-lite-linter, you can use the below sample code to get started.
More detailed examples can be found in Examples.
from vega_lite_linter import Lint
vega_json = {
"data": {
"url": "data/cars.json"
},
"mark": "bar",
"encoding": {
"x": {
"field": "Horsepower",
"type": "quantitative"
},
"y": {
"field": "Miles_per_Gallon",
"type": "quantitative"
},
"size": {
"field": "Cylinders",
"type": "ordinal"
}
}
}
# initialize
lint = Lint(vega_json)
# show rules that the input vega-lite json violated
violate_rules = lint.lint()
# show fixing recommendation by vega-lite-linter
fix = lint.fix()
Vega-lite-linter provides simple APIs for visualization developers to detect and fix issues in the built visualizations.
At first, a Lint
instance should be initialized given the target visualization specification:
lint = Lint(vegalite_json)
After initialization, the two functions listed below can be called on the instance object.
lint()
: Detecting Issueslint()
detects any issues in the given visualization specification. Each detected issue will be presented as an Rule object containing:
fix()
: Fixing Issuesfix()
runs the algorithm to help revise the visualization specification into a correct one.
The result of fix()
contains:
Action object contains:
The related Vega-Lite properties are listed as follows.
Vega-lite-linter helps detect some errors related to data by deriving data properties from raw data, such as data field type and min/max value of numerical data field. Currently, vega-lite-linter supports such calculation with inline data specified using values property, or build-in datasets of Vega and Vega-Lite.
Property | Value |
---|---|
mark | Required. The mark type of the visualization. Can be one of the following values: area, bar, line, point, and tick. |
Property | Value |
---|---|
channel | Required. The encoding channel type, which is specified as the key of each encoding. Can be one of the following values: x, y, color, size. |
field | The data field encoded by the channel. |
type | The type of measurement. Can be one of the following values: quantitative, temporal, ordinal, or nominal. |
bin | Binning discretizes numeric values into a set of bins. Can be one of the following values: true, false, or { maxBins: Maximum_number_of_bins(e.g., 10) }. |
aggregate | Aggregating summary statistics on the data field. Can be one of the following values: count, mean, median, min, max, stdev, sum and etc. |
stack | The type of stacking offset if the field should be stacked. Can be one of the following values: true, zero, normalize, center or false. |
scale | Functions that transform a domain of data values. |
The scale
property includes:
Property | Value |
---|---|
type | The type of scale transformation. Currently, the algorithm detects errors related to log type. |
zero | If true, ensure that a zero baseline value is included in the scale domain. |
More details about Vega-Lite properties can be found here.
Rules in vega-lite-linter are referred to and refined from Draco. The rules are grouped into four categories.
Rule | Meaning |
---|---|
enc_type_valid_1 | Verify the consistency of data field and type 'quantitative'. |
enc_type_valid_2 | Verify the consistency of data field and type 'temporal'. |
bin_q_o | Only use bin on quantitative or ordinal data. |
zero_q | Only use log scale with quantitative data. |
log_discrete | Only use log scale with non-discrete data. |
log_zero | A log scale cannot have a zero baseline in the scale domain. |
log_non_positive | Use log scale on data that are all positive. |
bin_and_aggregate | Use both bin and aggregate on the data in the same time is illegal. |
aggregate_o_valid | Oridnal data only supports min, max, and median aggregation. |
aggregate_t_valid | Temporal only supports min and max aggregation. |
aggregate_nominal | Nominal data cannot be aggregated. |
count_q_without_field_1 | Use count aggregation or declare a data field of an encoding, instead of doing both of them. |
count_q_without_field_2 | The encoding with count aggregation has to be 'quantitative' type. |
size_nominal | Channel size implies order in the data, it is not suitable for nominal data. |
size_negative | Channel size is not suitable for data with negative values. |
encoding_no_field_and_not_count | Declare the data field or use count aggregation in each encoding. |
color_with_cardinality_gt_twenty | Use at most 20 categorical colors in the visualization. |
stack_without_x_y | Use stack on x or y channels. |
stack_discrete | Use stack on continuous data. |
Rule | Meaning |
---|---|
repeat_channel | Use each channel only once. |
no_encodings | Use at least one encoding. Otherwise, the visualization doesn't show anything. |
same_field_x_and_y | Use different fields for x axis and y axis. |
count_twice | Use count aggregation once in the visualization. |
stack_without_summative_agg | Only use summative aggregation (count, sum, distinct, valid, missing) with stack in the encoding. |
stack_without_discrete_color_1 | Only use stack with a color channel encoding discrete data in the visualization. |
stack_without_discrete_color_2 | Only use stack with a color channel encoding discrete data in the visualization. |
stack_without_discrete_color_3 | Only use stack with a color channel encoding discrete data in the visualization. |
stack_with_non_positional_non_agg | When using stack in the visualization, apply aggregation in non-positional continuous channels (color, size) . |
Rule | Meaning |
---|---|
point_tick_bar_without_x_or_y | Use x or y channel for mark 'point', 'tick', and 'bar'. |
line_area_without_x_y | Use x and y channels for mark 'line' and 'area'. |
bar_tick_continuous_x_y | Use no more than one continuous data in the x and y channels for mark 'bar' and 'tick'. |
bar_tick_area_line_without_continuous_x_y | Mark 'bar', 'tick', 'line', 'area' require some continuous variable on x or y. |
bar_area_without_zero_1 | Mark 'bar' and 'area' require the scale of the x-axis to start at zero, when the x-axis encodes quantitative data. |
bar_area_without_zero_2 | Mark 'bar' and 'area' require the scale of the y-axis to start at zero, when the y-axis encodes quantitative data. |
size_without_point | Use the size channel with the mark 'point' would be better. |
stack_without_bar_area | Only use stacking for the mark 'bar' and 'area'. |
Rule | Meaning |
---|---|
invalid_mark | Use valid mark type, including 'area', 'bar', 'line', 'point', 'tick'. |
invalid_channel | Use valid channels, including x, y, color, size. |
invalid_type | Use valid types, including quantitative, nominal, ordinal, temporal. |
invalid_agg | Use valid aggregation, including count, mean, median, min, max, stdev, sum, etc. |
invalid_bin | Use non-negative number for bin amounts (maxbins). |
Vega-lite-linter was invented by the iDVx Lab together with AntV. Based on our technology, AntV and iDVx Lab also developed ChartLinter in Javascript to support visualization charts beyond Vega-Lite.
If you have any questions, please feel free to open an issue or contact idvx.lab [at] gmail.com.
The software is available under the MIT License.