10Min Research - 37. What to do after the Data Collection: How to Start the Data Analysis?

TL;DR

Code every questionnaire item into consistent indicator labels (e.g., ER1–ER7, RDR1–RDR6) before analysis so constructs are correctly specified later.

Briefing Cornell Notes

Briefing

Once questionnaire data is collected, the next decisive step is setting up a clean analysis workflow—starting with coding, moving through descriptive reporting, and then running structural equation modeling (SEM) in either SmartPLS or IBM SPSS AMOS. The core idea is that analysis isn’t just “run the software”; it’s a sequence of checks and reporting requirements that make the measurement model credible before any hypothesis testing happens.

The process begins with coding every questionnaire item into a consistent variable naming scheme. For example, ethical responsibility items are coded as ER1–ER7, research and development responsibilities as RDR1–RDR6, and the same pattern is applied to each construct. Demographic variables are handled as separate coded variables as well. This coding step matters because it determines how constructs and indicators will be recognized later in SEM.

Next comes data preparation and quality screening. The data can be defined and entered in Excel or IBM SPSS: in SPSS, variables must be created in Variable View, while in Excel, column names are placed in the top row. After the dataset is in place, the workflow shifts to “screening” the data—checking minimum and maximum values, and identifying missing data—so that later reliability and validity tests aren’t distorted by errors or incomplete responses.

After cleaning, descriptive statistics are reported. This includes the demographic profile of respondents and descriptive summaries for indicators. The guidance here is practical for thesis writing: descriptive statistics for indicators may be omitted in the main paper if space is tight, but they can still be presented in a table.

With the dataset ready, the SEM section is structured around two layers: the measurement model and the structural model. In the measurement model, the analysis first reports factor loadings, then establishes construct reliability and construct validity. Reliability is assessed using Cronbach’s Alpha and composite reliability. Validity is assessed through convergent validity and discriminant validity. Only after these quality criteria are established does the report move to the structural model.

For the structural model, the reporting starts with explanatory power using R². Hypothesis testing follows: direct relationships are reported plainly, while mediation and moderation require additional model specifications. If IBM SPSS AMOS is used, the write-up must also include model fit for both the measurement and structural models. If SmartPLS is used, the reporting structure follows the standard SmartPLS approach.

Finally, the results chapter ends with a chapter summary. The emphasis throughout is on organizing the results section so that readers can see the logic: code and clean the data, describe it, validate the measurement model, then test the relationships with SEM using the appropriate software.

Cornell Notes

After collecting questionnaire data, analysis should follow a clear sequence: code items, screen the dataset, report descriptive statistics, then run SEM. Coding means assigning consistent labels to each indicator (e.g., ER1–ER7 for ethical responsibility, RDR1–RDR6 for research and development responsibilities). Data preparation includes defining variables in SPSS or naming columns in Excel, then checking minimum/maximum values and missing data. In SEM, the measurement model comes first: report factor loadings, then assess reliability (Cronbach’s Alpha, composite reliability) and validity (convergent and discriminant). Only after that should the structural model be reported using R² and hypothesis tests, with mediation/moderation as needed.

Why does coding questionnaire items matter before running SmartPLS or IBM SPSS AMOS?

Coding creates a direct mapping between questionnaire items (indicators) and the constructs they belong to. Using a consistent scheme like ER1–ER7 for ethical responsibility and RDR1–RDR6 for research and development responsibilities ensures the software can correctly treat each indicator as part of its intended construct during factor loading, reliability, and validity testing.

What are the minimum data-screening steps recommended before descriptive statistics and SEM?

After entering data in Excel or defining variables in IBM SPSS (Variable View), the workflow calls for screening the dataset by checking minimum and maximum values and identifying missing data. This helps prevent reliability/validity results from being distorted by out-of-range entries or incomplete responses.

What should be included in descriptive statistics, and what’s the thesis-writing nuance?

Descriptive statistics should cover the demographic profile of respondents and descriptives for indicators. For thesis papers, indicator descriptives may be omitted from the main text, but they can still be presented in a table if needed.

How is the SEM measurement model structured in the reporting workflow?

The measurement model reporting starts with factor loadings, then moves to construct reliability and construct validity. Reliability is assessed using Cronbach’s Alpha and composite reliability. Validity is assessed through convergent validity and discriminant validity. These quality checks must be established before interpreting the structural relationships.

What gets reported in the SEM structural model, and how do mediation/moderation change it?

The structural model reporting begins with explanatory power using R². Hypothesis testing then follows: direct relationships are reported as direct paths, while mediation and moderation require additional model structures to capture indirect effects or interaction effects.

How does reporting differ between IBM SPSS AMOS and SmartPLS in this workflow?

When using IBM SPSS AMOS, the write-up must include model fit for both the measurement model and the structural model. With SmartPLS, the workflow emphasizes the standard SmartPLS reporting structure after the measurement model quality criteria are met.

Review Questions

What coding scheme would you use to label indicators for each construct, and why must it be consistent?
List the reliability and validity metrics required for the measurement model, and explain the order in which they appear in the results section.
What additional reporting requirement appears when using IBM SPSS AMOS compared with SmartPLS?

Key Points

1
Code every questionnaire item into consistent indicator labels (e.g., ER1–ER7, RDR1–RDR6) before analysis so constructs are correctly specified later.
2
Define variables properly in IBM SPSS (Variable View) or set column names in Excel, then screen the dataset for minimum/maximum values and missing data.
3
Report descriptive statistics for respondents’ demographics and indicator descriptives, using tables if the main thesis text omits indicator descriptives.
4
Structure SEM results in two stages: measurement model first (factor loadings, reliability, validity) and structural model second (R² and hypothesis tests).
5
Assess construct reliability using Cronbach’s Alpha and composite reliability, and assess construct validity using convergent and discriminant validity.
6
Include mediation and moderation analyses when hypotheses require indirect effects or interaction effects, not just direct paths.
7
When using IBM SPSS AMOS, report model fit for both measurement and structural models; SmartPLS follows its standard reporting structure.

Highlights

SEM reporting should start with the measurement model quality checks—factor loadings, reliability (Cronbach’s Alpha, composite reliability), and validity (convergent and discriminant)—before any hypothesis testing.

Data screening is treated as a necessary gate: check minimum/maximum values and missing data after entering the dataset in Excel or IBM SPSS.

R² is the first structural-model metric to report, followed by direct relationships or mediation/moderation paths.

IBM SPSS AMOS requires model fit reporting for both measurement and structural models, while SmartPLS uses a different standard reporting structure.