PhD Student Weekly Project Update 1 - PhD Research Pipeline

TL;DR

The project’s immediate publishing target is late-January submission for “user modeling adaptation and personalization,” chosen for both feasibility testing and long-term reuse.

Briefing Cornell Notes

Briefing

A new PhD research push is underway, with the first week focused on turning a broad research direction into a conference-ready user modeling plan—then validating it with supervisor feedback before touching the full dataset. After completing a major milestone (a stage 2 transfer) and moving into more focused work, the plan for the coming year includes targeting summer-heavy publishing, while also preparing for earlier conference submission windows that arrive quickly in January.

The immediate publishing goal centers on user modeling, specifically work aimed at a conference titled “user modeling adaptation and personalization,” which has submissions at the end of January. Even though the timeline is tight, the work is treated as valuable regardless of whether a full paper ships—because it can feed into longer-term projects. The decision-making process starts with mapping what could realistically be produced within the six-month window, including checking conference dates and reviewing past submissions to understand what format (short vs. long paper) the venue expects.

The week’s work then shifts into idea generation and feature planning. Brainstorming is paired with a deliberate attempt to align with a period of heightened creativity and readiness to learn. In practical computer science terms, that brainstorming becomes a detailed list of potential data features for runner-related user modeling. The features include baseline attributes, injury-history modeling (despite uncertainty about whether a runner has actually been injured), and eliciting information about runners’ abilities from the available data. The output of this phase is not code yet, but a “mock-up” of the problem and an initial solution outline—supported by reading related papers from both the researcher’s group and a wider research network.

Once the concept feels coherent enough to describe, the researcher consults a supervisor to pressure-test feasibility. The supervisor response is encouraging: there’s value in trying even if the outcome is only an abstract submission rather than a full paper. The supervisor also steers the focus toward a specific angle that hasn’t been done yet, while also aligning with a longer-term research priority. With that direction set, the project moves into early implementation.

Instead of running experiments on the full dataset immediately, the researcher begins with a subset to reduce the risk of discovering code problems after expensive computation. The dataset is described as extremely large—about two million activities across roughly 5,000 runners. The first coding effort targets feature-extraction functions, including logic for modeling training breaks. That means calculating the distance between sessions in days and defining break-length categories (examples include 3-, 7-, and 10-day breaks) to identify when a runner’s next break of a given length occurs. The week ends with finishing these “pernickety” data-prep tasks, setting up the next week for actual modeling work.

Cornell Notes

After finishing stage 2 transfer, the researcher is building a new PhD project plan aimed at early conference submissions, with a primary target in late January for “user modeling adaptation and personalization.” The first week focuses on brainstorming and feature design for runner user modeling—covering baseline features, injury-history inference despite missing ground truth, and extracting runners’ abilities from available data. Conference feasibility is checked by reviewing submission timelines and past work, then the concept is validated with a supervisor who encourages trying even if only an abstract is possible. Implementation starts with a subset of the dataset (about 5,000 runners and ~2 million activities) to develop feature-extraction code safely. Current coding work centers on training-break logic by measuring session gaps in days and categorizing break lengths (e.g., 3/7/10 days) to support later modeling.

Why does the project start with a conference target in late January even though the timeline is tight?

The late-January submission window for “user modeling adaptation and personalization” is treated as a realistic test of what can be produced quickly. Even if a full paper isn’t achieved, the work is still considered useful because it can be reused for longer-term projects. The researcher also checks which parts of the year have the biggest publishing opportunities (summer) and uses the early window to build momentum and experience.

What does “feature extraction” mean in this runner user modeling context?

Feature extraction is the process of turning raw activity data into model inputs. The researcher plans features such as baseline attributes, injury-history-related signals (even though the data may not directly confirm injuries), and proxies for runners’ abilities derived from what the dataset contains. The goal is to define functions that can compute these features reliably from the subset before scaling up.

How does the researcher handle the risk of working on too much data too early?

Instead of running on the full dataset immediately, the researcher begins with a subset: about 5,000 runners and roughly two million activities. This approach reduces the chance of discovering major code or pipeline errors after expensive computation, while still providing enough scale to develop and debug feature-extraction functions.

What specific data-prep task is being implemented before modeling begins?

The current focus is training-break logic. That includes computing the distance between sessions in days and defining break-length categories such as 3-, 7-, and 10-day breaks. The code aims to determine when a runner’s next break of a specified length occurs, which then becomes input for later modeling.

How is the research idea refined from brainstorming into something submission-ready?

The process combines conference research (checking past work and expected paper formats) with reading related user modeling papers from the researcher’s group and broader network. After drafting an initial problem and solution outline, the researcher consults a supervisor to confirm feasibility and to narrow the focus to a specific, previously underexplored angle—potentially sufficient for an abstract even if a full paper doesn’t materialize.

Review Questions

What criteria does the researcher use to decide whether a conference submission is feasible within a short January turnaround?
How do training-break features get computed from session data, and why do break-length categories matter for modeling?
Why does starting with a subset of ~5,000 runners help more than jumping straight to the full dataset?

Key Points

1
The project’s immediate publishing target is late-January submission for “user modeling adaptation and personalization,” chosen for both feasibility testing and long-term reuse.
2
A six-month planning window is used to match conference deadlines with what can realistically be produced, with summer identified as the main publishing period.
3
Runner user modeling is translated into a concrete feature plan, including baseline features, injury-history inference without direct labels, and ability proxies from available data.
4
Conference readiness is improved by reviewing past submissions and understanding whether the venue expects short or long papers.
5
Supervisor feedback is used to validate feasibility and to narrow the work to a specific, novel focus area.
6
Implementation begins on a subset (~5,000 runners, ~2 million activities) to debug feature-extraction code before scaling up.
7
Current coding work centers on training-break feature engineering by measuring day gaps between sessions and categorizing break lengths (e.g., 3/7/10 days).

Highlights

The late-January conference target is treated as a “try anyway” milestone: even an abstract submission is valuable if a full paper isn’t ready.

Feature planning happens before coding—turning brainstorming into a list of computable signals like injury-history proxies and ability-related features.

Training-break engineering is implemented as session-gap logic in days, with explicit break-length categories (3, 7, 10 days) feeding later modeling.

Starting with ~5,000 runners avoids expensive failures that could occur if the full dataset were used immediately.

Topics

PhD Research Pipeline
Conference Planning
User Modeling
Feature Extraction
Training Breaks

PhD Student Weekly Project Update 1 - PhD Research Pipeline - getting started with a new project