DataViz Makeover 2

Sentiments of Covid-19 Vaccinations

Tan Huiyong https://sg.linkedin.com/in/tan-huiyong-72a1224b
02-19-2021

Data Visualisation Link (Tableau Public) - https://public.tableau.com/views/dataviz2Covid-19VaccinationSentiments/Dashboard4

a. Critiques and Suggestions for Current Visualisations

Clarity

  1. The Likert scale in the original are presented as stacked horizontal bars, aligning to the left of the chart. It is not visually clear to differentiate the survey results of each country as all of the categories have a common starting and ending points.

  2. The Likert scale on the left is sorted by country alphabetical order, while the right chart is sorted by “% strongly agreed”. This can be quite confusing for viewers as they must manually map match the countries on the chart on the right to the one on the left.The visualizations could have consistent sorting order.

  3. “% of strongly agreed to vaccination” is plotted using horizontal bar graph. This is not appropriate as the uncertainty of the proportion is not accounted for. A dot graph with error bars could be used instead to depict the uncertainties clearly.

Aesthetics

  1. The Likert scale’s horizontal axis tick marks are rounded to 0 decimal place while the chart on the right is 1 decimal point. This is visually inconsistent. This inconsistency should be standardized such that both charts are using the same decimal points for the horizontal axes.

  2. The color scheme of the bars in the charts used does not match the degree of the responses. For example, the charts use red to represent “neutral” response even though it is usually associated as negative / more severe. Therefore, 2 colored scales (shades of red for disagreements and blue for agreements) could be employed.

  3. The title of the legend box is displayed as “Vac 1”, which does not reveal the context of the legends. A more meaningful legend title could be used.

  4. The legends are too far away from its chart, making it more tedious for viewers to understand the representations of each color.

b. Proposed alternative data visualisation

Alternative design

The alternative design is an interactive visualization. It uses Gantt chart to visualize the survey result and utilized dot graph with error bars (confidence interval) to show the optimism of the survey takers. This visualization assumes normal distribution for the population.

The interactivity allows users to toggle different parameters (age, country, gender, household size, etc) and sorting. This would allow insights to be discovered. The dashboard also allows user to select a secondary parameter for 2-dimensional assessment. The dashboard also allow viewer to perform a dynamic age bin for more insights.

Advantages

  1. Gantt chart is clear, effective and can show a large chunk of data neatly and succinctly. As a result, it can show sentiments of groups taking the survey with positive sentiments on Covid-19 vaccine to be positioned more to the right, vice-versa.

  2. The visualization would make use of parameters ensure that the sorting of both charts is consistent such that both charts would be showing the categories in the same order. This aims to ensure clarity as user toggle the dashboard.

  3. The visualizations utilize color scales of red and blue to show the degree of agreement / disagreement. It provides more intuitive colors to the viewer.

  4. Dot graph with error bars is adopted as it shows the underlying statistical confidence interval. This is crucial as the survey is taken by a sample population and statistical uncertainty is expected.

  5. The x-axis of both charts is rounded to the nearest percent for aesthetic consistency.

  6. The legends are visualization titles are chosen meaningfully and they would be able to let viewers understand the context of the dashboard. The order of the chart is as such to allow the Likert chart to be closer to the legend box. This would allow viewers to refer to the legend easily.

c. Proposed alternative data visualisation

Link to Tableau Public

d. Step-by-step description of visualization

Data Cleaning and exploration

The data for the visualization was retrieved from Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub hosted at Github . There is a total of 14 files in CSV format. Each CSV file contains data for a country and a union was performed to append all the data together.


A new column “Table Name” was created upon the union and they reflect the CSV file names (“country_name.csv”). A split was performed on “Table Name” column such that country name remains.


Initial data exploration was conducted, and it revealed that the data contains around 377 columns, most of which would not be used in the visualization. The redundant columns are removed, with 18 remaining columns, as shown below.


As shown in the table above, some of the country’s data split the employment status into 7 columns. Hence a further investigation was conducted on employment status. It revealed that some dataset’s employment status is a single column containing 7 unique values and some dataset split the 7 values into 7 columns with binary values.


Hence a calculated filed “New_employment_status” is required to reconcile “employment_status”.


A filter was made on the “endtime” to get January 2021 data as the survey was conducted during the said period.


A check was performed, and no null value was observed in the survey questions vac_1, vac2_1, vac2_2, vac2_3, vac2_6 and vac3. This was done by constructing a simple text table of each survey questions against its counts.


Similar checks were performed on age, gender, household size, household children and “New_employment_status”. While no null value was observed, inconsistency of the household size and household children were observed; most of the values were captured in the nominal form, while a small number of them were captured in the measurement form as shown below.


These inconsistencies were standardized to nominal format using the following calculated fields for amend for household size and children.


Visualization

Gantt Chart

Gantt chart will be plotted to visualize the survey. A series of calculated fields and parameters would be required.

Parameters

Vaccination Question

This parameter serves to allow viewer to select from the list of 6 survey questions. The values were based on the original survey questions.


Main Parameters

This parameter would allow viewer to toggle among different fields (Country, age, gender, etc.) such that the visualization would display according to viewer’s interest.There is an option called “All” to allow viewers to view at the overall level (all parameters).


Age Bin Size

The age in the data needs to be binned as there is a wide range of age values. To make the age bin dynamic, parameter with the following settings was used. The bin range was set between 5-50 to ensure that the bins are meaningful.


Sort By

This allows interactive sorting. For this visualization, viewer will be allowed to toggle between “Alphabetical Order” and “Strongly Positive to vaccination”


Sort Order

Allow dynamically sort by ascending or descending order, affecting all the visualizations.


Secondary Parameters

This parameter will display an optional secondary field for visualization. Its settings are similar to that of the “Main Parameters”, except that it has a “Null” option to allow viewer to have the option to hide the secondary parameter.


Calculated Fields

Building Steps

Create aliases for “Selected Question” so that the visualization would display the aliases instead.


The parameters “Vaccination Question” and Main Parameters” were turned on, while the Gantt Percent and Selected Parameters are pulled to Columns and Rows respectively.


“Selected Question” was pulled into Color and it would display the starting position of each responses. The responses had to be sorted according to the positivity of the responses.



“Percentage” is dropped into Size so that the chart can display the response in horizontal stacked bar chart. However, the initial result shows that the bars’ positions are truncated and the fact that this is due to “Gantt Percent” and “Percentage” being computed at table level.


“Gantt Percent” and “Percentage” were then computed using “Selected Question” and the basic Gantt chart takes shape.


“Count” is dropped into Label to display the value of the response. For this visualization, only the extreme responses of “Strongly Agreed” and “Strongly Disagreed” would display the values to preserve the aesthetics of the visualization. This was done by toggling the label to display the min/max at cell level using the “Gantt Precent” field as shown below.


Next, the label is configured to show as a “percent of total”, computed using “Selected Question”.


The format of the field was then changed to percentage with 0 decimal place to ensure good aesthetics.


Country, age, gender, household size, household children and “New_employment_status” were dropped into filter so that the visualization could be filtered accordingly based on viewer’s interest.


The sorting parameters are included, and it would only work after the sorting of the fields are toggled to capture the returns from the sorting parameters as shown below.


The “Selected Sec Parameters” are dropped into Columns in case viewers would like to view the visualizations in 2 dimensional fields.


The final product for the Gantt chart is shown below.


Visualization on Uncertainty to Visualize Strong Positive Sentiment to Covid-19 Vaccination

The visualization of the strong positive response to vaccination is presented using dot graph with error bar to show the uncertainties.

Parameters

This visualization will use the same parameters done in Gantt chart.

Calculated Fields


Building Steps

The base of the uncertainty visualization is based on dot graph. Thus, “Prop” and “Selected Parameters” are dropped into Columns and Rows respectively, while the Marks are changed to circle.


Next, Measure values containing ”Prop_Upper_Limit” and ”Prop_Lower_Limit” was placed into Columns and the marks were changed to line for this case.


“Prop” and Measure Values are then merged using dual axis with synchronizing axes. “Measure names” was droppedinto the path to generate the error bars.


For the purpose of aesthetics, this visualization would match color of the dot graph to that of the Gantt chart based on the response of interest (either strongly disagreed or strongly agreed depending on the survey question). “Color Switcher” is dropped into Color and the colors were chose based on the “Color Switcher” indicator. i.e. if color switcher shows, red, change the color to red, vice-versa.


Like Gannt chart, the sort is configured to be based on the “Sort by” and “Sort Order” parameters


The “Secondary Parameter” is dropped into column in case viewers are interested to compare 2 dimensions of fields.


The final product for the visualization of uncertainty using dot graph with error bars is shown below.


Survey Question

This is a simple visualization aims to display the selected Survey Question dynamically.

Building Steps

“Vaccination Question” is dropped into Text while “Color Switcher” is dropped into color as depicted below.


The colors are then selected according to the color switcher indicator.


The final product for the visualization of the survey question is shown below.


Y-Axis Title

This is a simple visualization that would display the y-axis title of the visualization dynamically depending on the selection in “Selected Parameters”.


The text is toggled to display the words horizontally and the chart title is hidden.


The final product for the visualization is shown below.


Gantt Title 2

2 y-axis titles are required, thus the “Y-Axis Title” visualization was duplicated.


Dashboard

Building Steps

The dashboard size is set to “Automatic”.


A horizontal container is dropped into the dashboard and the Uncertainty visualization was dropped into the container.


The Gantt chart was dropped to the right of the Uncertainty chart.


The “Y-axis Title” was dropped onto the dashboard as floating object.


Same process was done for “Y-Axis Title 2” and both are positioned to the respective visualization. 2 horizontal containers are placed above the main visualization as shown. The empty container on the top left will house the main parameters like “Vaccination Question” and “Main Parameters”.


A text box and “Survey Question” visualization is inserted in between the main visualization and the top parameters panel. Ensure that the title of the “Vaccination Question” is hidden.


Remove redundant legends like “Color Switcher” on the right of the dashboard. Rearrange the order of the right panel such that the legend appears on top. “Secondary parameters” are placed in the top panel beside “Main Parameters” The title of the visualizations is also amended. The filters for the Gantt chart is also displayed.


To make the dashboard more aesthetically pleasing, the filters are changed to dropdown with multiple values and they are ordered according to the dropdown of the “Main Parameters”. The sorting parameters are then placed at the bottom of the right panel.


As the dashboard is interactive, it was observed that the tick marks intervals would change for both charts. As such, the tick marks intervals are fixed to 20% for Gantt chart and 10% for the uncertainty visualization respectively.


Inconsistency of the border lines between the 2 charts was observed. The Gantt chart’s borders were adjusted to be consistent to the uncertainty visualization.


The dashboard was then published onto Tableau Public.


However, it was observed that the floating object “Y-axis Title 2” was truncated on Tableau Public, as such it was removed. The final dashboard on Tableau Public is shown below.


e. 3 Observations from Final Visualization

Worrying Situation in Japan


Overall Japanese are most worried to vaccination and have no strong faith in their government health authority to provide effective vaccine. This is a worrying fact as its people might opt out of the vaccination and it dampen the nationwide vaccination efforts. Unfortunately, the Japanese’s worry was not unfounded as the news on 16 Feb 2021 reported that millions of vaccines might be wasted due to shortage of the required special syringe (Reuters, 16 Feb 21).

It will be wise for the Japanese governments improve the morale of its citizens.

Generation Gap


In general, the older generation would want to get vaccinated than the younger generation. This is understandable as older people are more vulnerable to Covid-19 due to their weakened immune system, thus they would have better sentiments on the vaccination. Conversely, the younger generation is less optimistic about the vaccination.

Children Makes a Difference


In general, higher proportion of household with 0 or a few children strongly agreed that they would regret if they did not get a Covid-19 vaccine when it is available. This is an interesting fact that is worth further investigation as to why households with many children are less likely to think that they strongly agree that they are going to regret not taking the vaccine when available.