#SmartPLS4 Webinar Day 1: Measurement Model Assessment

Q: Why does data cleaning come before measurement model assessment in Smart PLS workflows?

Because measurement quality metrics depend on the integrity of the input data. If impossible values or excessive missingness remain, results can show false problems such as low convergent validity (low AVE) or weak discriminant validity (constructs that appear insufficiently distinct). The session treats cleaning as the foundation that makes later loadings, reliability, and validity outputs trustworthy.

Q: How are outliers and respondent misconduct identified differently?

Outliers are statistical extremes detected via box plots or standardized values (Z-scores beyond ±3.3). Respondent misconduct is detected by checking response consistency within a construct: compute standard deviation across items (e.g., Vision 1–4). Near-zero deviation—like repeated 5s or 1s—suggests the respondent may not have read the questionnaire. Deletion still requires subject-matter judgment about whether identical responses are theoretically plausible for that construct and population.

Q: What are the core measurement model quality criteria, and what thresholds are used?

The session uses four main criteria: (1) factor loadings, commonly benchmarked around >0.70 for good item-to-construct representation; (2) reliability via Cronbach’s alpha and composite reliability/CR, typically accepted around >0.70; (3) convergent validity via AVE, typically requiring >0.50; and (4) discriminant validity via HTMT ( 0.10).

Q: How does the workflow connect measurement model assessment to later hypothesis testing?

Measurement model assessment is treated as a prerequisite quality check. It focuses on whether items measure constructs well (loadings, reliability, convergent and discriminant validity). Hypothesis testing with path coefficients and significance comes later during structural model assessment, not during the measurement model stage.

TL;DR

Clean data using min/max checks, then correct data-entry errors by tracing back to the questionnaire when values fall outside the expected scale range.

Briefing Cornell Notes

Briefing

Measurement model assessment in Smart PLS starts long before factor loadings and validity tables—data cleaning is treated as the gatekeeper for trustworthy results. Skipping it can cascade into misleading measurement outcomes, including failed convergent validity (low AVE) and weak discriminant validity (constructs that don’t look distinct). The workflow begins with checking minimum/maximum values to catch impossible entries (e.g., an age of 79 when the expected range tops out at 65), then correcting them by tracing back to the questionnaire or fixing data-entry errors. Next comes missing data handling: interpolation is only appropriate when missingness is limited; if a large share of responses is empty, filling values can distort the measurement and should be avoided.

After range checks and missing-data decisions, the process turns to outliers and respondent misconduct. Outliers can be identified via box plots or standardized Z-scores, with extreme standardized values (beyond ±3.3) flagged for removal. Respondent misconduct is handled differently: the method computes standard deviation across items within a construct (e.g., Vision 1–4). If responses show near-zero variation—like repeated patterns (all 5s or all 1s)—the data may reflect inattentive responding. But deletion isn’t automatic; the guidance is to use subject-matter judgment about whether identical responses are theoretically plausible for that population and construct.

Once the dataset is “filtered,” Smart PLS setup begins with defining a workspace, importing the dataset (including scale ranges and missing-value markers if needed), and sanity-checking distribution diagnostics such as skewness and kurtosis (noting that Smart PLS is less dependent on normality than covariance-based SEM, but extreme violations still matter). The first modeling milestone is the measurement model, which evaluates how well survey items represent latent constructs. Four quality criteria drive the assessment: factor loadings (item-to-construct representation), reliability (Cronbach’s alpha and composite reliability/CR, with thresholds commonly around 0.70), convergent validity (AVE, typically requiring >0.50), and discriminant validity (construct distinctiveness).

Factor loadings are checked first, with a common benchmark of ~0.70; items below that level aren’t necessarily deleted immediately, because AVE and CR can still be acceptable and content validity may be harmed by aggressive item removal. Reliability is then verified using alpha (more conservative) and composite reliability (more aligned with PLS’s loading-based logic). Convergent validity is assessed through AVE, which is computed from squared loadings; if AVE is low, the response is not just “delete more items,” but to consider whether the questionnaire design, item wording, or sample characteristics are the real problem.

Discriminant validity is assessed using multiple lenses: HTMT (with typical acceptance below 0.85), Fornell–Larcker (the square root of AVE should exceed inter-construct correlations), and cross-loadings (each item should load highest on its own construct, with differences ideally above 0.10). When discriminant validity fails—such as a high HTMT between Vision and Rewards—the fix can involve diagnosing respondent misconduct (items producing identical patterns across constructs) or removing problematic items with cross-loading (e.g., deleting Reward 4 when it loads too similarly on another construct). The result is a measurement model that can then support the later structural model stage, where path coefficients and hypothesis testing come into play.

Cornell Notes

The session lays out a practical, step-by-step measurement model assessment workflow for Smart PLS, emphasizing that data cleaning must happen before reliability and validity checks. Cleaning includes range validation (min/max), careful missing-data handling (interpolation only when missingness is limited), outlier detection using Z-scores/box plots, and respondent misconduct screening using within-construct standard deviation (e.g., repeated identical responses). After importing and sanity-checking the dataset, the measurement model is evaluated through factor loadings, reliability (Cronbach’s alpha and composite reliability/CR), convergent validity via AVE (>0.50), and discriminant validity using HTMT (<0.85), Fornell–Larcker (sqrt(AVE) greater than correlations), and cross-loadings (own-construct loading higher by ~0.10). When discriminant validity fails, the remedy is targeted—diagnose misconduct or remove specific cross-loading items—rather than indiscriminately deleting data.

Why does data cleaning come before measurement model assessment in Smart PLS workflows?

Because measurement quality metrics depend on the integrity of the input data. If impossible values or excessive missingness remain, results can show false problems such as low convergent validity (low AVE) or weak discriminant validity (constructs that appear insufficiently distinct). The session treats cleaning as the foundation that makes later loadings, reliability, and validity outputs trustworthy.

How are outliers and respondent misconduct identified differently?

Outliers are statistical extremes detected via box plots or standardized values (Z-scores beyond ±3.3). Respondent misconduct is detected by checking response consistency within a construct: compute standard deviation across items (e.g., Vision 1–4). Near-zero deviation—like repeated 5s or 1s—suggests the respondent may not have read the questionnaire. Deletion still requires subject-matter judgment about whether identical responses are theoretically plausible for that construct and population.

What are the core measurement model quality criteria, and what thresholds are used?

The session uses four main criteria: (1) factor loadings, commonly benchmarked around >0.70 for good item-to-construct representation; (2) reliability via Cronbach’s alpha and composite reliability/CR, typically accepted around >0.70; (3) convergent validity via AVE, typically requiring >0.50; and (4) discriminant validity via HTMT (<0.85), Fornell–Larcker (sqrt(AVE) exceeding inter-construct correlations), and cross-loadings (own-construct loading higher than other-construct loadings by about >0.10).

If a factor loading is below 0.70, should the item always be deleted?

No. The guidance is to avoid automatic deletion. Even with loadings below 0.70, convergent validity (AVE) and reliability (CR) may still be acceptable. Deleting items can also harm content validity if the remaining items no longer represent the construct’s conceptualization. The decision should consider whether removing the item improves validity and whether the construct’s content coverage remains intact.

What should be done when discriminant validity fails between two constructs?

First, identify the source of overlap. The session suggests checking respondent misconduct by computing standard deviation across items for the two constructs; if responses barely change across constructs, inattentive responding may be driving the overlap. Second, inspect cross-loadings: if a specific item loads too similarly on another construct (difference <~0.10), remove that item and rerun the model. The example fix involved deleting Reward 4 to reduce discriminant validity problems between Vision and Rewards.

How does the workflow connect measurement model assessment to later hypothesis testing?