Understanding the Questionnaire/Scale Development Process. Edited Webinar
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Scale development is essential when existing questionnaires don’t measure a construct accurately for a study’s context and operational definition.
Briefing
Scale development is necessary when existing questionnaires fail to measure a concept in the specific way a study needs—especially for constructs that are not well represented in the literature. Researchers often start with the goal of establishing relationships between variables, but many key concepts (like job satisfaction, employee engagement, or community hostility) are not directly measurable. Instead, they are “latent variables” that require multiple questionnaire items to capture different aspects of the underlying construct. When no suitable scale exists—such as for community hostility, or for higher-education social responsibility—researchers must build a new scale tailored to their context rather than cobbling together mismatched items.
The process begins with defining the concept in operational terms, not just conceptually. An operational definition determines what the study will actually measure and, crucially, which items belong. Without it, a questionnaire can drift into measuring the wrong components, leading to reviewer rejection and a study that cannot be defended. The webinar emphasizes purposiveness: every measurement choice should follow from how the construct is operationalized. For example, higher education social responsibility must include elements like ethics, academics, and research; using a corporate-focused CSR instrument that omits those elements creates a mismatch. Similar examples show how organizational commitment can be operationalized as employees’ emotional desire to stay, or how perceptions of CSR image can be operationalized as consumers’ belief that a firm supports socially beneficial activities.
Once the construct is operationalized, scale development proceeds through a structured sequence. Researchers generate an item pool using existing literature when available, then refine it through expert review and/or interviews and focus groups. Items are written in a consistent format (often as statements rated on a Likert-type scale such as 5- or 7-point options). Next comes categorization into dimensions: items that reflect related themes are grouped into subdimensions (for instance, in a servant leadership in higher education project, items were organized into dimensions such as ethical behavior, development orientation, emotional healing, empowerment, humility, pioneering, relationship building, and wisdom). This dimensional structure is not assumed to be correct; it must be tested.
After data collection, exploratory factor analysis (EFA) is used to check whether the expected grouping holds statistically. EFA functions as a data-reduction method, clustering highly correlated items into fewer factors so researchers can work with dimensions rather than dozens of individual items. Items may be removed if they load weakly, cross-load onto multiple dimensions, or appear to reflect something different from the intended construct—often because of wording problems or respondent misunderstanding. In one example, an initial set of 68 items was reduced to 37 after EFA, and even one dimension was eliminated.
Finally, researchers assess reliability and validity. Reliability checks whether the scale produces consistent results, while validity checks whether it measures what it claims to measure. The webinar also notes that some studies skip early pilot testing and rely on expert content/face validity before conducting EFA and confirmatory factor analysis (CFA) in the main study. Regardless of the exact route, the end goal is a defensible, psychometrically supported scale that can be used to test relationships between constructs in subsequent research.
Cornell Notes
Scale development is required when existing questionnaires don’t adequately measure a construct in the specific context of a study. Because many constructs are latent variables (not directly observable), they must be measured with multiple items that are summed or averaged into a single score representing the underlying construct. The process starts with an operational definition that guides which items should be included; otherwise, questionnaires can omit core elements and fail reviewer scrutiny. Item generation draws on literature, expert input, and interviews or focus groups, followed by grouping items into dimensions. Exploratory factor analysis then tests whether the proposed dimensional structure holds, removing weak or cross-loading items, after which reliability and validity are assessed (often with EFA and CFA).
Why can’t researchers rely on a single item to measure many psychological or social constructs?
How does an operational definition protect a scale from becoming “off-target”?
What is the role of exploratory factor analysis (EFA) in scale development?
How do researchers generate and refine an item pool before EFA?
What does it mean to “categorize items into dimensions,” and why must that be tested?
What reliability and validity checks are expected after factor analysis?
Review Questions
- What consequences follow from failing to write and use an operational definition when selecting or adapting questionnaire items?
- Describe how EFA changes the number of variables researchers analyze and what kinds of items are typically removed during this step.
- Give an example of how a construct’s context (e.g., higher education vs. corporations) can require a different scale than what exists in the literature.
Key Points
- 1
Scale development is essential when existing questionnaires don’t measure a construct accurately for a study’s context and operational definition.
- 2
Most constructs in social science are latent variables, so they require multiple items whose responses are aggregated into a single score.
- 3
Operational definitions determine what the study measures and which items should be included; mismatches can invalidate the research.
- 4
Item generation commonly uses literature, expert review, and interviews or focus groups, then formats items with consistent response scales.
- 5
Dimensional structure must be tested empirically; EFA clusters correlated items into fewer factors and helps remove weak or cross-loading items.
- 6
Reliability and validity assessments are required to demonstrate that the final scale measures the intended construct consistently and correctly.
- 7
A defensible scale development write-up needs step-by-step justification, including sources for items and evidence for the resulting dimensions.