10Min Research - 37. What to do after the Data Collection: How to Start the Data Analysis?
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Code every questionnaire item into consistent indicator labels (e.g., ER1–ER7, RDR1–RDR6) before analysis so constructs are correctly specified later.
Briefing
Once questionnaire data is collected, the next decisive step is setting up a clean analysis workflow—starting with coding, moving through descriptive reporting, and then running structural equation modeling (SEM) in either SmartPLS or IBM SPSS AMOS. The core idea is that analysis isn’t just “run the software”; it’s a sequence of checks and reporting requirements that make the measurement model credible before any hypothesis testing happens.
The process begins with coding every questionnaire item into a consistent variable naming scheme. For example, ethical responsibility items are coded as ER1–ER7, research and development responsibilities as RDR1–RDR6, and the same pattern is applied to each construct. Demographic variables are handled as separate coded variables as well. This coding step matters because it determines how constructs and indicators will be recognized later in SEM.
Next comes data preparation and quality screening. The data can be defined and entered in Excel or IBM SPSS: in SPSS, variables must be created in Variable View, while in Excel, column names are placed in the top row. After the dataset is in place, the workflow shifts to “screening” the data—checking minimum and maximum values, and identifying missing data—so that later reliability and validity tests aren’t distorted by errors or incomplete responses.
After cleaning, descriptive statistics are reported. This includes the demographic profile of respondents and descriptive summaries for indicators. The guidance here is practical for thesis writing: descriptive statistics for indicators may be omitted in the main paper if space is tight, but they can still be presented in a table.
With the dataset ready, the SEM section is structured around two layers: the measurement model and the structural model. In the measurement model, the analysis first reports factor loadings, then establishes construct reliability and construct validity. Reliability is assessed using Cronbach’s Alpha and composite reliability. Validity is assessed through convergent validity and discriminant validity. Only after these quality criteria are established does the report move to the structural model.
For the structural model, the reporting starts with explanatory power using R². Hypothesis testing follows: direct relationships are reported plainly, while mediation and moderation require additional model specifications. If IBM SPSS AMOS is used, the write-up must also include model fit for both the measurement and structural models. If SmartPLS is used, the reporting structure follows the standard SmartPLS approach.
Finally, the results chapter ends with a chapter summary. The emphasis throughout is on organizing the results section so that readers can see the logic: code and clean the data, describe it, validate the measurement model, then test the relationships with SEM using the appropriate software.
Cornell Notes
After collecting questionnaire data, analysis should follow a clear sequence: code items, screen the dataset, report descriptive statistics, then run SEM. Coding means assigning consistent labels to each indicator (e.g., ER1–ER7 for ethical responsibility, RDR1–RDR6 for research and development responsibilities). Data preparation includes defining variables in SPSS or naming columns in Excel, then checking minimum/maximum values and missing data. In SEM, the measurement model comes first: report factor loadings, then assess reliability (Cronbach’s Alpha, composite reliability) and validity (convergent and discriminant). Only after that should the structural model be reported using R² and hypothesis tests, with mediation/moderation as needed.
Why does coding questionnaire items matter before running SmartPLS or IBM SPSS AMOS?
What are the minimum data-screening steps recommended before descriptive statistics and SEM?
What should be included in descriptive statistics, and what’s the thesis-writing nuance?
How is the SEM measurement model structured in the reporting workflow?
What gets reported in the SEM structural model, and how do mediation/moderation change it?
How does reporting differ between IBM SPSS AMOS and SmartPLS in this workflow?
Review Questions
- What coding scheme would you use to label indicators for each construct, and why must it be consistent?
- List the reliability and validity metrics required for the measurement model, and explain the order in which they appear in the results section.
- What additional reporting requirement appears when using IBM SPSS AMOS compared with SmartPLS?
Key Points
- 1
Code every questionnaire item into consistent indicator labels (e.g., ER1–ER7, RDR1–RDR6) before analysis so constructs are correctly specified later.
- 2
Define variables properly in IBM SPSS (Variable View) or set column names in Excel, then screen the dataset for minimum/maximum values and missing data.
- 3
Report descriptive statistics for respondents’ demographics and indicator descriptives, using tables if the main thesis text omits indicator descriptives.
- 4
Structure SEM results in two stages: measurement model first (factor loadings, reliability, validity) and structural model second (R² and hypothesis tests).
- 5
Assess construct reliability using Cronbach’s Alpha and composite reliability, and assess construct validity using convergent and discriminant validity.
- 6
Include mediation and moderation analyses when hypotheses require indirect effects or interaction effects, not just direct paths.
- 7
When using IBM SPSS AMOS, report model fit for both measurement and structural models; SmartPLS follows its standard reporting structure.