PhD Student Weekly Project Update 1 - PhD Research Pipeline - getting started with a new project
Based on Ciara Feely's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
The project’s immediate publishing target is late-January submission for “user modeling adaptation and personalization,” chosen for both feasibility testing and long-term reuse.
Briefing
A new PhD research push is underway, with the first week focused on turning a broad research direction into a conference-ready user modeling plan—then validating it with supervisor feedback before touching the full dataset. After completing a major milestone (a stage 2 transfer) and moving into more focused work, the plan for the coming year includes targeting summer-heavy publishing, while also preparing for earlier conference submission windows that arrive quickly in January.
The immediate publishing goal centers on user modeling, specifically work aimed at a conference titled “user modeling adaptation and personalization,” which has submissions at the end of January. Even though the timeline is tight, the work is treated as valuable regardless of whether a full paper ships—because it can feed into longer-term projects. The decision-making process starts with mapping what could realistically be produced within the six-month window, including checking conference dates and reviewing past submissions to understand what format (short vs. long paper) the venue expects.
The week’s work then shifts into idea generation and feature planning. Brainstorming is paired with a deliberate attempt to align with a period of heightened creativity and readiness to learn. In practical computer science terms, that brainstorming becomes a detailed list of potential data features for runner-related user modeling. The features include baseline attributes, injury-history modeling (despite uncertainty about whether a runner has actually been injured), and eliciting information about runners’ abilities from the available data. The output of this phase is not code yet, but a “mock-up” of the problem and an initial solution outline—supported by reading related papers from both the researcher’s group and a wider research network.
Once the concept feels coherent enough to describe, the researcher consults a supervisor to pressure-test feasibility. The supervisor response is encouraging: there’s value in trying even if the outcome is only an abstract submission rather than a full paper. The supervisor also steers the focus toward a specific angle that hasn’t been done yet, while also aligning with a longer-term research priority. With that direction set, the project moves into early implementation.
Instead of running experiments on the full dataset immediately, the researcher begins with a subset to reduce the risk of discovering code problems after expensive computation. The dataset is described as extremely large—about two million activities across roughly 5,000 runners. The first coding effort targets feature-extraction functions, including logic for modeling training breaks. That means calculating the distance between sessions in days and defining break-length categories (examples include 3-, 7-, and 10-day breaks) to identify when a runner’s next break of a given length occurs. The week ends with finishing these “pernickety” data-prep tasks, setting up the next week for actual modeling work.
Cornell Notes
After finishing stage 2 transfer, the researcher is building a new PhD project plan aimed at early conference submissions, with a primary target in late January for “user modeling adaptation and personalization.” The first week focuses on brainstorming and feature design for runner user modeling—covering baseline features, injury-history inference despite missing ground truth, and extracting runners’ abilities from available data. Conference feasibility is checked by reviewing submission timelines and past work, then the concept is validated with a supervisor who encourages trying even if only an abstract is possible. Implementation starts with a subset of the dataset (about 5,000 runners and ~2 million activities) to develop feature-extraction code safely. Current coding work centers on training-break logic by measuring session gaps in days and categorizing break lengths (e.g., 3/7/10 days) to support later modeling.
Why does the project start with a conference target in late January even though the timeline is tight?
What does “feature extraction” mean in this runner user modeling context?
How does the researcher handle the risk of working on too much data too early?
What specific data-prep task is being implemented before modeling begins?
How is the research idea refined from brainstorming into something submission-ready?
Review Questions
- What criteria does the researcher use to decide whether a conference submission is feasible within a short January turnaround?
- How do training-break features get computed from session data, and why do break-length categories matter for modeling?
- Why does starting with a subset of ~5,000 runners help more than jumping straight to the full dataset?
Key Points
- 1
The project’s immediate publishing target is late-January submission for “user modeling adaptation and personalization,” chosen for both feasibility testing and long-term reuse.
- 2
A six-month planning window is used to match conference deadlines with what can realistically be produced, with summer identified as the main publishing period.
- 3
Runner user modeling is translated into a concrete feature plan, including baseline features, injury-history inference without direct labels, and ability proxies from available data.
- 4
Conference readiness is improved by reviewing past submissions and understanding whether the venue expects short or long papers.
- 5
Supervisor feedback is used to validate feasibility and to narrow the work to a specific, novel focus area.
- 6
Implementation begins on a subset (~5,000 runners, ~2 million activities) to debug feature-extraction code before scaling up.
- 7
Current coding work centers on training-break feature engineering by measuring day gaps between sessions and categorizing break lengths (e.g., 3/7/10 days).