The Effect of Simulated Contextual Factors on Recipe Rating and Nutritional Intake Behaviour

Despite the importance of context in Recommender Systems (RSs) more generally, and its clear applicability in the food domain, most existing research focuses on single contextual factors, and only considers simple extrinsic factors such as location and time. No RSs research has systematically explored the impact of multiple dynamic factors, or investigated the effect of emotion in determining people’s eating, recipe rating and nutritional intake behaviour. To bridge these gaps, we conducted a comprehensive large-scale (n=397) crowdsourced experimental study to uncover the intricate relationship between various simulated contextual factors and users’ subsequent recipe rating and implied nutritional intake behaviour. We further aimed to explore how these contextual factors can be incorporated to improve recommendation performance. Four distinct types of contextual factors were investigated: seasonal, emotional, busyness and physical activity, encompassing a total of seven elements. Our findings show that people’s eating preferences and the likelihood of them choosing to eat healthy recipes vary depending on the simulated context they find themselves in. Moreover, we demonstrate how these contextual features can be used to significantly improve recipe rating prediction performance. Our research has implications for the future development of food RSs, and shows that emotion-aware systems could lead to better healthy food recommendations.


INTRODUCTION
The context in which we find ourselves can have a strong effect on our individual choices.Therefore, situations where choices need to be made, such as those targeted by Recommender Systems (RSs), could benefit profoundly from the intelligent inclusion of contextual factors in the recommendations.Work in many RSs domains, including: location-aware restaurant recommendation [14], timeaware tourism recommendation [63,65], context-aware TV content recommendation [64], or mood-aware music recommendation [45], has demonstrated this benefit.
Food, and particularly healthy food, recommender systems are an important and growing area of RSs research [22], and one in which context may also play a vital role.Existing research from psychology and food science has already demonstrated the profound impact of various contextual factors in the food people choose to eat [20,48].Predicting what food a user may want to eat, and what food one would be better to eat, is a complex and multifaceted process, which could be influenced by various biological, personal, and social factors [13].Food recommender systems can play a significant role in promoting healthy lifestyles by suggesting suitable food products that match users' dietary preferences and meet their basic biological and physiological requirements.However, making recommendations based on nutritional content may increase the risk of rejection, since most popular recipes are positively associated with higher fat and calories [52,58].
Individuals may not consistently maintain an unhealthy diet under all circumstances, and making blanket health recommendations could potentially overwhelm or frustrate users.Indeed, specific situations may trigger or encourage unhealthy eating habits.Therefore, it is crucial to examine whether users exhibit varying dietary and nutritional intake patterns under different contextual situations.Additionally, it is important to investigate when it is most necessary to provide healthy recommendations and how to integrate nutritional information to reduce the likelihood of recommendation rejection.
Prior food science research highlights that static factors like cultural background [36], as well as personal dynamic factors including stress levels, mood or emotion may markedly affect food consumption behaviour [33].However, there has been very little research directly investigating these factors in the food recommendation field.Despite the importance of context in RSs more generally, and its clear applicability in the domain of food, most existing research focuses on single contextual factors, and only considers simple extrinsic factors such as time or location.No research has systematically explored how multiple dynamic factors affect people's eating and recipe rating behaviour.
We bridge these gaps by exploring how various dimensions of dynamic factors, including seasonality, emotional states, busyness, and physical activity, affect people's recipe rating and implied nutritional consumption behaviours.We combine insights gained from analyses of recipes ratings from a large online food portal, as well as reviewing possible influential factors discussed by psychology studies.We conducted a large-scale (n=397) crowdsourced experimental study to learn the relationship between various simulated contextual factors and users' subsequent recipe rating and implied nutritional intake behaviour.
Note that a common issue in food recommender systems work is the difficulty in directly measuring users' cooking and eating behavior, as preparing food requires a significant amount of effort [35].Biologically speaking, the urge to eat and cook a certain food is usually driven by a liking or preference for a particular food or recipe [43].As such, a high rating can be considered a proxy for the future intention to consume a certain recipe.The behaviour measured in this research is implied behaviour, and the discussion and implications of this research are based on this assumption.
We examine seven contextual scenarios from four types of contextual factor: "seasonal", "emotional", "busyness" and "physical activity", and compare these with a "context-free" baseline condition.Furthermore, we explore how these contextual factors can be employed to improve recommendation performance.More precisely, we address the following research questions: • RQ1: Do people's recipe rating behaviour vary among different simulated contextual situations?• RQ2: To what extent do contextual factors affect people's implied nutritional intake behaviour?• RQ3: Can integration of these contextual factors improve recommendation performance?• RQ4: Which contextual factors are the most influential factors when recommending foods?
The novel contributions of our research can be summarised as follows: • Introduction and provision of a novel large-scale dataset to study context-aware healthy food (recipe) recommender systems.• In addition to predicting user preferences based on recipe ratings from online recipe websites, our validation process involves gathering evaluations and insights from users in simulated contextual scenarios.• We are the first to reveal the impact of various dynamic contextual factors on individuals' recipe rating behaviours.Our study also highlights variations in implied nutritional intake behaviours across different contextual situations, offering valuable insights for incorporating health information into healthy food recommendation systems.• By identifying the most influential factors beneficial to machinelearning-based recommendation algorithms, our research contributes to the development of the next generation of context-aware food recommender systems.

LITERATURE REVIEW
Early work in psychology and food science has demonstrated that there are a wide variety of influential contextual factors that shape food choice [47].For example, the socialisation and acculturation standards that people have learned, normative family eating patterns, and historical cultural beliefs may lead to different food choices [36]; and changes in mood and general emotional state may regulate eating and vice versa [33,38].Tangible physical capital such as money, equipment, and intangible capital such as time, cooking skills and knowledge could all determine varied food choice [46]; as well as change of social and broader environments, such as changes of roles, surrounding by different groups or communities [4,49], seasonal, home or workplace could provide opportunities and obligations for reconstructing eating relationships and food choices [10,11].However, few contextual factors have been explored in prior research in the food recommendation domain.Due to a lack of available contextual features in publicly-available recipe rating datasets, researchers often focus on studying and attempting to incorporate only single contextual factors into food RSs.Kusmierczyk et al. [29] found there are clear temporal patterns in online recipe production and uploading behaviour, but the relationship between recipes generation and user's eating behaviour remains unclear.Cavazza et al. [6] found that females are more interested in smaller and more "elegant" meals than males, while Rokicki et al. [39] sought to integrate gender into a collaborative filtering model, demonstrating a small improvement in performance.However, the researchers only built a single gender-aware RS without considering other factors.Location, which is known to be one of the most significant contextual factors in other domains, has also been widely researched in the food domain [7,9,34], but most research attempts to demonstrate the impact of location on eating behaviour rather than focusing on the development of location-aware food recommendation systems.
Trattner et al. [57] investigated the relationship between cooking interests, hobbies and nutritional values of online recipes.They suggest that learning the patterns between a user's hobbies and eating preferences could provide motivating goals for persuasive systems.Recently, Gao et al. [18] proposed a context-aware recommendation system based on graph neural networks, the model has been tested on two popular food datasets.Due to the challenge of revealing the meaning of hidden neural structures in a deep learning network, the paper does not specify which exact contextual factors were contributed in the model.To date, due to the difficulty of duplicating contextual environments in a laboratory setting, various dynamic personal factors, such as emotional state and fluctuations in stress levels, have been little explored.
Another significant challenge exists when considering the balance between nutrition and people's food preferences, particularly when both aspects are integrated into a food recommendation system [12].Simply considering the accuracy of recommendation results isn't sufficient to build as ideal food RS, as most popular online recipes tend to be less healthy [57].Most research considers healthy recommendations only but neglects the user's past preferences, or vice verse [19,59].Ueta et al. [59] developed a recipe retrieval system based on 45 common nutrients, which allows users to easily search for nutritious recipes using natural language to address specific health conditions without taking into account the user's past recipe preference.Harvey et al. [21] designed a study, which asked people what factors they thought were important, and found that 17 factors (mainly food-based factors, health and temporal factors) impacted users' rating choice.However, they did not investigate how these factors interacted with each other and how specific scenarios actually impact food choice and nutritional intake behaviour.More recently, Wang et al. [61] introduced a multimodal health-aware food recommendation system that incorporates word2vec and VGGNet-19 to extract textual and visual features, and demonstrated superior performance compared to traditional recommendation systems.Yet, the recommendations are generated based on the user's health profile and associated tags, rather than considering the user's past search history or preferred recipe choices.
In summary, previous research rarely considers health recommendations along with the user's past preferences simultaneously.A further notable aspect is that no research has focused on incorporating contextual factors into food recommendation systems while making healthy recommendations.It is apparent from previous literature that more in-depth research is required in order to better understand what contextual factors are most impactful in determining people's food choices as well as implied nutritional intake behaviours.This is particularly so for contextual factors related to seasonality, emotional status, stress levels and physical activities, which have been suggested in the food science domain to potentially have a significant impact on people's eating behavior.

METHOD 3.1 Experimental design
The primary aim of the study is to examine how seven dynamic contextual factors, including a hot summer's day, a cold winter's day, happy emotion, sad emotion, a busy and stressful weekday, a relaxing weekend, and after physical activities, along with a context-free generic group, affect people's recipe rating, and implied nutritional consumption behaviour.To achieve this, we conducted a large-scale (n=397), between-subjects user study.Participants were randomly assigned to one of eight groups (the seven simulated contexts plus a baseline "context-free" condition), shown material to simulate the given context -detailed below -and then asked to rate recipes under that contextual scenario.Each participant rated 30 recipes from a pool of 75 recipes using a Likert scale from 1 (strongly dislike) to 5 (strongly like).All participants were recruited online using the Prolific crowdsourcing platform; the survey instrument was designed using Qualtrics.
The pool of 75 recipes were sampled from Allrecipes.com, one of the most popular online recipe websites, which has previously been used as a data source by various researchers in the food recommendation domain (e.g., [22,40,66]).The recipes were chosen to ensure a distribution across various factors, including: seasonality (i.e., winter and summer), healthiness, recipe category (i.e., main dish, soup, salad, and dessert/snack); and to ensure that both vegetarian and vegan recipes were included.
Recipe seasonality was determined based on the popularity of each recipe (i.e., total number of 4 or 5 ratings) during each season.For example, the summation of 4 or 5 ratings given to a recipe during the months of December, January and February are that recipe's winter popularity.Recipes were flagged as being vegetarian and/or vegan based on a manual analysis of their constituent ingredients.
Healthiness was determined using World Health Organisation (WHO) [62], U.S. Food & Drug Administration (FDA) [16] and United Kingdom Food Standards Agency (FSA) standards [17].These are international standards that have been previously used in this domain [56].The calculations were based on the detailed rules provided by each standard.For the WHO and FDA health scores, we initially converted the nutrient values in grams provided by Allrecipes.com into percentages of the daily recommended values.Then, we evaluated whether the recipe's nutrient values fell within the standard range.If a particular nutrient value fell within the range, we added 1 to that recipe's score; otherwise, we added nothing.Consequently, the higher the score, the healthier the recipe.
The calculation of the FSA health score is more intricate.We began by computing the total weight of each recipe based on their ingredient lists, then divided this by the number of servings to obtain the weight per portion.Subsequently, we calculated the values for fats, saturated sats, sugar, and salt per 100g.Next, we categorise these nutrient values as green, amber, or red based on their corresponding range.If a nutrient value falls within the green range, it is assigned one point.If it falls within the amber range, it receives two points, and if it falls within the red range, it is assigned three points.Here, a higher overall score for a recipe indicates a lower level of healthiness.
The primary objective is to assess the health level of individual recipes; however, established standards like WHO or FDA define ideal daily nutritional intake.As observed in Western societies, the prevalent dietary pattern often consists of three meals a day [31,54].To account for this we divided the standard recommended daily nutritional intake by three.This approach provides a more accurate indicator of the nutritional quality for an individual recipe.

Stimuli and manipulation material
In the study, seven contextual situations were simulated, under which participants could rate recipes.To minimise the influence of real-life environmental factors and enhance the evocation of these simulated scenarios, inspiration was drawn from studies conducted by Imani and Montazer [26] and van Strien et al. [60].
To accomplish this, we employed the external emotion stimuli method, as described in previous studies [23,44].This method primarily relies on source materials such as photos, music, or videos to evoke emotions.In a controlled laboratory setting, videos offer a more effective and immersive means of eliciting specific emotions compared to images or music alone.Furthermore, videos provide a rich blend of visual and auditory stimuli.This abundance of sensory input makes videos the preferred choice for eliciting emotions with higher intensity, efficiency, and accuracy [23,24,50].
In this study, participants were presented with a 22-second video clip designed to engage their cognitive faculties and enhance their immersion in the assigned simulated contextual situation [55].For instance, the video for a hot summer day featured a beautiful sunny day, a thermometer displaying high temperatures, and a person sweating.In contrast, the video for a cold winter day depicted heavy snowfall and strong winds.Participants in the "context-free" control group were not shown a video.To further immerse participants in their designated contextual scenario, recipe templates (see Figure 1) were designed to align with each scenario.The number of special elements, such as the sun for a hot summer day or weightlifting for after physical activities, was consistent across templates, and the sizes of these elements were nearly identical when inserted into the background.For reference, specific recipe examples can be found in Figure 1.

Procedure
The survey workflow is illustrated in Figure 2. Participants expressed their interest in taking part via the Prolific platform.Prior to commencing the survey, participants were asked to review the information sheet and complete a consent form.Subsequently, participant demographic information was collected, encompassing basic details such as age, gender, ethnic origin, etc.
After completing the demographic questions block, participants encountered the recipe rating block.In this study, participants were randomly assigned to a group (as described above) before providing their ratings.As discussed above, participants in the contextual scenario groups were instructed to watch a 22-second video clip designed to immerse them in that context.Each participant was tasked with rating 30 recipes within their assigned contextual situation block.Finally, participants were asked two questions regarding their reasons for assigning high or low ratings to each recipe.
To maintain a high level of data quality, we implemented both manipulation and attention checks to verify that participants were following the prescribed procedures and paying adequate attention.The manipulation check involved a question where participants had to identify the theme of the video they saw at the start of the questionnaire (e.g., "hot summer day").There were two types of attention check: one nonsensical recipe (including ingredients such as: 5 stones, 3 cups of sand), which should always elicit a negative response; and attention-check recipes, where participants were explicitly instructed to choose a specific response.The attention check questions were presented after participants had rated all 30 recipes.While the text in the attention check recipes remained identical, the background varied to maintain consistency with recipes in different contextual scenarios.

Data collection and participants
G*Power was employed to provide an estimate for an appropriate sample size [15].Based on the data from previous experimental studies regarding personalisation and user behaviour [28,32], we estimated the effect size to be 0.23.The estimate indicated that 360 participants were needed in this research to achieve a significance criterion of 5% and 90% power.Post hoc analysis of effect size based on the ANOVA of mean and variance of each group, resulted in an effect size larger than that assumed a priori (Cohen's  =0.277) [8].
We recruited participants between May 2023 and July 2023 exclusively through Prolific.Each participant received a payment of £1.82, in accordance with the current living wage standards.A total of 428 participants were recruited for this study.Participants who failed the manipulation check (19, 4.4%) and the nonsensical item attention check (12, 2.8%) were excluded from the study.After thorough data inspection and cleaning, we retained data from 397 valid individuals.
Among these participants, 212 (53.4%) identified as male and 177 (44.6%) as female, 5 (1.3%) were non-binary or gender diverse, and the remaining 3 (0.7%) preferred not to disclose their gender.The largest age group was 25-34, representing 30.7% of participants, followed by the 35-44 age group at 25.8%, the 45-54 age group at 19.4%, the 18-24 and 55-64 age group with a similar proportion at 10.8% and 10.3%, respectively.Participants aged 65-74 accounted for 3.3%, with only one participant aged 75 or above.The majority of participants hailed from the UK and the US.Regarding ethnic origin, White participants dominated with 329 (82.9%), followed by Asian or Asian British at 34 (8.6%).The remainder consisted of participants identifying as Black, Black British, Caribbean, or While participants' home countries were relatively diverse, the majority were currently residing in the UK (351, 88.4%) and the US (44, 11.1%).

User rating prediction model and Feature engineering
The raw CSV data file was downloaded from Qualtrics.Data cleaning and preprocessing were carried out using Python Jupiter notebook 6.3.0.According to recent systematic reviews on recommender systems conducted by [27] and [42], tree-based models have been widely employed in model-based recommender systems.Therefore, we aimed to evaluate the performance of tree models on our dataset; XGBoost, a boosting tree model, was chosen for its efficiency and flexibility.Its objective function evaluates model performance based on a set of parameters, while a regularisation term controls model complexity to prevent overfitting [30].Notably, tree models offer more interpretable explanations than other models, which aligns with our goal of identifying the most influential features in the prediction task.In this study, the task of rating prediction was approached as a regression problem and not as a top-k relevance prediction problem.Consequently, evaluation metrics such as precision@K and nDCG would not be appropriate for evaluation.Instead, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2) will be reported to evaluate the model test performance.
After data cleaning and preprocessing, a total of 11,910 ratings were collected, distributed as follows: hot summer day (HSD) 1500 ratings, cold winter day (CWD) 1500 ratings, happy emotion (H) 1470 ratings, sad emotion (S) 1470 ratings, busy weekday (B) 1470 ratings, relaxing weekend (R) 1500 ratings, after physical activities (APA) 1500 ratings, and generic context-free group (G) 1500 ratings.The datasets for each experimental group were divided into training and testing sets using an 80:20 ratio.The target variable is the recipe rating, ranging from 1 to 5, maintaining consistency across each model.
A total of 29 features, which include participants' demographic features and recipe nutritional features, were used to train the XG-Boost model.While the main focus of the study was on identifying novel dynamic contextual factors, the reason for including demographic features is that they have the potential to enhance model performance.As these features can provide the trained model with additional reference information.
Two sets of experiment were conducted.The primary objective of the first experiment was to assess the importance of adding contextual features to the model.In this experiment, all 11,910 ratings were utilised.By systematically adding and removing contextual features within the model, the importance of each contextual feature can be determined.The second experiment was mainly to identify the most influential individual dynamic contextual factors at the model building level.To achieve this, the dataset was divided into eight groups based on contextual scenario groups to facilitate model performance comparisons.

RESULTS
4.1 RQ1-Do people's recipe rating behaviour vary among different simulated contextual situations?
To address RQ1, we used an ANOVA test with the null hypothesis (H0) that there are no significant differences (no variation in means) across the contextual scenarios.Figure 3 shows the rating distributions across the simulated contextual scenarios.Particularly noteworthy is the rating distribution in the generic baseline group (G), which displays low variance, with most ratings falling within the range [2.75, 3.25].In contrast, the ratings for the 'cold winter day' (CWD) and 'hot summer day' scenarios exhibit a considerably more dispersed range.The rating distribution for the busy weekday contextual scenario (B) is notably lower than those of the other scenarios.Surprisingly, the score distributions of the groups experiencing happy and sad emotions appear similar; however, the sad emotion group demonstrated a marginally higher mean rating range when compared to the happy emotion group.
The normality of residuals was assessed using the Shapiro-Wilk test (see Table 1).The p-values for seven of the groups were found to be non-significant, indicating that the data was drawn from a normal distribution.However, note that the data for the 'after physical activities' group exhibited a significant departure from normality.
A one-way ANOVA test was conducted to explore whether variations existed in individuals' implied eating behaviours and recipe rating responses across the different contextual scenarios.The result revealed that there were significant differences between the groups (F(7, 592)=7.564,p≪0.001), allowing us to reject the null hypothesis.As the ANOVA test doesn't test the relationships between each group, we subsequently conducted multiple pairwise comparisons, employing Tukey's Honest Significant Difference (HSD) test.The outcomes of Tukey's test are shown in Table 2.

RQ2-To what extent do contextual factors affect people's implied nutritional intake behaviour?
The determination of recipe health levels primarily relies on the FSA standard [17] in this section as this is the most appropriate standard for evaluating a single recipe.The other two standards focus on measuring appropriate daily nutritional intake.In alignment with this standard, a higher FSA score signifies a less healthy recipe, with scores ranging from 4 (extremely healthy recipe) to 12 (extremely unhealthy recipe).After data aggregation, it was observed that recipes with the highest FSA score of 12 were remarkably popular.In fact, such recipes garnered the highest mean ratings among six groups: 'hot summer day' (mean rating of 3.21), 'happy emotion' (3.35), 'sad emotion' (3.53), 'busy weekday' (3.57), 'relaxing weekend' (3.75), and 'after physical activities' (3.7), as well as the 'generic' group (mean rating of 3.89), see Table 3.It is notable that, across most groups, recipes with better health ratings tend to have been given lower scores than their less healthy counterparts.
In the generic group the mean rating remained relatively consistent, with only minor fluctuations (ranging from 3.144 to 3.889 score), suggesting that, in the absence of contextual factors, individuals' recipe preferences don't result in significant changes in health outcomes.However, in the 'after physical activities' group, healthy recipes were generally preferred over less healthy ones.Conversely, 'during relaxing weekend' unhealthy recipes are more popular.Similarly, both 'happy' and 'sad emotion' groups show a preference towards unhealthy recipes, particularly in the case of the 'sad emotion' group.
Additionally, we investigated the properties of favoured recipes.We filtered out the most popular recipes, considering only those with ratings of 4 and 5 under each contextual situation.These recipes were then aggregated based on recipe categories (e.g., Main dish, Soup, Salad and Dessert/Snack) and three grouped levels of FSA scores: Low (FSA levels 4,5 and 6), Medium (FSA levels 7, 8 and 9), and High (FSA levels 10, 11 and 12), after Starke et al. [53].See Figure 4. We found that during 'cold winter days' and 'after physical activities', people tend to prefer main dishes, most of which fall into the (Medium) health category.Desserts and snacks are favoured when people are 'feeling sad' and 'feeling happy', with many of these items belonging to the unhealthy (High) food category.Predictably, salads are preferred during 'hot summer days' as they are relatively healthy compared to other categories.Soups are more popular during 'cold winter days' and when people are feeling sorrow, suggesting that soups might be an effective comfort food.Most soups fall into the healthy (Low) and general (Medium) health categories.In general, emotional changes may lead to increased consumption of unhealthy recipes.

RQ3-Can integration of these contextual factors improve recommendation performance?
In this study, as the task of rating prediction was approached as a regression problem, employing the XGBoost regression model.Model evaluation involved the calculation of several key metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2).The primary comparison entailed assessing models trained with the inclusion of all available features (contextual scenario feature included) against a baseline model that excluded these features.This comparative analysis aimed to identify the impact of the contextual factors on model performance.Additionally, the top 5 most importance features were identified to discern the most influential variables for rating prediction.
The XGBoost model employed a comprehensive set of 29 features encompassing demographic attributes (e.g., age, gender, ethnic origin, physical activity level, survey ID), recipe-related details (e.g., recipe name, category, nutritional information), information regarding cooking (e.g., cooking frequency, skill level), recipe health level (WHO, FDA, FSA and WHO/3, FDA/3), and the contextual scenario information.Since each health standard emphasises a slightly different nutritional aspect, which may potentially impact model performance, all of them were included in the model-building process.As demonstrated in Table 5, the model utilising the complete feature set exhibited superior performance compared to the model without contextual scenario features.Contextual scenario features include the eight simulated contextual situations.The all-feature model yielded an MSE of 1.642, RMSE of 1.281, and MAE of 1.052, all of which outperformed the baseline model (MSE 1.688, RMSE 1.299, and MAE 1.071).The all-feature model achieved a higher R2 score (0.198) compared to the baseline (0.175), signifying a superior goodness of fit.
More importantly, the feature importance analysis revealed that the contextual scenario features were the most significant among the 29 features considered, as evidenced by Figure 5.This underscores the impact of contextual information on both human implied behaviour and algorithmic comprehension thereof.The incorporation of contextual factors in the model enhances its capacity to intelligently discern and uncover the hidden relationships within individuals' preferences in the dataset.This, in turn, leads to more precise rating predictions.Surprisingly, gender emerges as the second most important feature, suggesting potentially substantial differences in rating and dietary preferences between men and women.The importance of the Survey ID feature lies in its role in identifying and distinguishing an individual's implied eating and rating behaviour.This feature is commonly utilised in traditional matrix factorisation recommender systems, and so it comes as no surprise that it ranks as the third most important feature.Notably, the influence of recipe category surpasses that of recipe name within the XGBoost model.This could be attributed to the broader category information reducing model complexity, thereby facilitating more precise weight generation for each leaf node.Online recipe searching and cook book use frequency have the potential to uncover hidden patterns related to an individual's cooking interests and skills.We further applied the XGBoost model to predict ratings segmented by contextual scenario to investigate which group led to better model performance.The results show that using data from the 'happy emotion' group achieved the highest model performance (MSE=1.628,RMSE=1.276,MAE=1.053, and R2=0.178), as shown in Table 6.Data from 'hot summer day', 'happy emotion', 'sad emotion', and 'busy weekdays' generally performed better than the other groups.When compared with the generic baseline group and considered from a model fitting perspective, the datasets for all simulated contextual scenarios achieved higher R2 values.We note the relatively lower performance of the models using the 'cold winter day', 'relaxing weekend', 'after physical activities' compared to other contexts.This may be due to the fact that, even though we specified a single contextual scenario to the participants, this may still be influenced by other independent factors in that scenario.For example, during a relaxing weekend, one may also feel happy, and after physical activities, one may feel both tired and energetic simultaneously.
The demographic factors, such as the time spent living in one's current country of residence, gender, home country, and cooking skill level, are frequently shown to be the most important factors in each contextual scenario group, as indicated in Table 7. Remarkably, the FDA health level is the most important factor in the 'sad emotion' and 'busy weekday' groups, the second most important in the 'happy emotion' group, and fourth most important in the 'relaxing weekend' group.This may indicate that, under these contextual situations, people may unconsciously be even more prone to choosing relatively unhealthy recipes over healthy ones.Salt and saturated fats are the most important features for the 'relaxing weekend' and 'cold winter day' groups, respectively.This may be due to the preference for larger meals during relaxing weekends, leading to higher salt levels.During cold winter days, hearty main dishes are preferred, potentially resulting in higher levels of calories and saturated fats.This analysis also suggests, as posited earlier, that the contextual scenario can act as a moderator of other predictive features in RS models.The abbreviations are the same as in Figure 3.Under certain contexts, the fitted model performed worse than the null model and achieved a negative R2.In such situations, the average rating would be used for rating prediction.

DISCUSSION AND IMPLICATIONS
This study investigated whether people's recipe rating and implied eating and nutritional intake behaviour changed under different simulated contextual situations.Additionally, the study examined The abbreviations are the same as in Figure 3.
whether integrating contextual features could improve model-based recommendation performance and identified which features were the most important in each contextual scenario.We found out that people's eating preference and the likelihood of consuming healthy recipe during busy weekdays differ significantly from other contextual situation.This difference is supported by the results of a one-way ANOVA, which indicated a significant variation in mean ratings during busy weekdays compared to situations such as cold winter days, sad emotions, after physical activities, and the generic group.The demanding and busy work schedule during weekdays often leaves individuals with limited time for cooking.This constraint may restrict their freedom to think and choose preferred food or recipes [37].In this situation, the primary goal of cooking and eating becomes satisfying hunger, and meals should preferably be prepared and completed quickly.Consequently, recipes that are easy and quick to make, often involving refined or processed products and other potentially less healthy options, are preferred during busy weekdays.Our findings on distinct implied eating behaviour during busy weekdays align with the research conducted by Pinho et al. [37], which suggests that a hectic lifestyle may lead to reduced consumption of vegetables and home-cooked meals.
In addition, it is worth considering that the varying stress levels associated with a busy lifestyle, as supported by Hyldelund et al. [25], may result in individuals exhibiting a shift in their dietary choices, characterised by a reduction in main meals and an increase in snack consumption.This phenomenon could also explain the observed significant differences, such as those evident when comparing busy weekdays with cold winter days.In the latter case, individuals may gravitate towards carbohydrate-rich options, such as main dishes, potentially contributing to this distinction in eating preferences.
As anticipated, there is a significant difference in recipe preference between 'hot summer's days' and 'cold winter days', providing evidence for the seasonality of food preferences over time [51].Furthermore, the analysis of recipe health levels revealed a notably higher consumption of main dishes during the 'cold winter day' contextual group.This aligns with the findings of Capita and Alonso-Calleja [5], who concluded that both men and women tend to consume more energy during the winter months.Surprisingly, we did not find significant differences between participants in the happy and sad emotion groups.Individuals appeared to exhibit similar food preferences, even under these extremely different emotional situations.Under both of these scenarios, participants showed an increased demand for unhealthy food [3], relative to the baseline condition.However, this may be a limitation in the ability of the external emotion stimuli method to effectively evoke such strong and polarised emotions.
Perhaps unsurprisingly, individuals prioritised nutritional information and opted for relatively healthier recipes after engaging in physical activities [1].This may indicate that social norms and health-conscious behaviours play a significant role in influencing food choices after physical exertion.
Beyond its theoretical implications, our study offers a novel perspective on the development of context-aware food recommender systems.Previous research has predominantly focused on locationaware or gender-aware food recommender systems [2,41].In our algorithmic experiments, contextual features emerged as the most influential among all 29 considered features, leading to improved accuracy in rating predictions.Importantly, the datasets within each contextual scenario group exhibited higher R2 values compared to the baseline group.This is likely due to the baseline group encompassing a wide range of random possibilities, making it challenging for the model to discern meaningful patterns.In contrast, the inclusion of contextual features provides the model with a clearer direction for uncovering hidden patterns.Our findings, therefore, suggest that emotion-aware systems could represent the next generation of food recommender systems.This could also be combined with season-and stress level-awareness.
Previous food recommendation systems have primarily focused on either context-aware or healthy recommendations [41,52].Our research has demonstrated examples where the contextual scenario acts as a moderator, allowing other features to perform better than they would without the context as a precedent.For example, during busy weekdays, there was a noticeable increase in the consumption of unhealthy food.Addressing how to incorporate nutritional information into recommender systems to encourage healthy eating habits during hectic lifestyles could present a novel approach to balancing the trade-offs involved in healthy food recommendations.Currently, such systems have not been proposed in the field of food recommendation.While our results are promising, they warrant validation on a larger, more naturalistic dataset for practical implementation.

LIMITATIONS AND FUTURE WORK
There are several limitations in our study that are worth pointing out.As commonly done in the food recommendation literature, we take ratings as a proxy for intent to consume.As such the behaviour measured in this research is implied behaviour -none of the recipes were actually consumed.More in-depth research is needed to investigate whether people's actual behaviour changes under different contextual situations.Despite our efforts to control for the impact of individual contextual factors in our experimental design, the results may still be affected by uncontrolled real-world variables.Our user studies were necessarily somewhat contrived and simulating emotions is clearly not the same as experiencing them naturally.Manipulating simulated contexts may lead to the representation of an artificial nature, introducing a potential conflict between simulated scenarios and the real world.Therefore, a study of real-life contexts is needed to thoroughly confirm the findings presented in this study.
Additionally, our examination of user ratings was based on a limited sample of recipe images (n = 75) and only included four main categories (main dishes, soup, desserts/snacks, and salads).A larger and more diverse set of recipes could lead to more generalisable study results.The performance of the XGBoost model was poor in this study.This may be because we did not perform feature selection and hyperparameter optimisation.As the primary aim of this research is to understand and identify the most contributing contextual factors for recipe rating prediction, the current study represents a preliminary component in a larger project.The complete model training will be reflected in the next stage of the research.
For future work, we plan to expand our study by including information on ingredients and cooking methods to gain a deeper understanding of how participants' food preferences differ at these levels.Secondly, we intend to develop an emotion-aware post-filtering healthy food recommendation system based on the available dataset.Finally, given that the contextual scenarios investigated are not necessarily mutually exclusive, we intend to investigate how combinations of dynamic contextual factors impact people's eating and nutritional intake behaviour.

Figure 1 :
Figure1: Demonstration of recipe template under each contextual scenario (taking "Apple Pie By Grandma Ople" as an example recipe).Note that the example on the bottom-right is the baseline, context-free condition.

Figure 3 :
Figure 3: The data distribution between each contextual scenario group (The above abbreviation stands for information below.HSD: hot summer day (contextual group), CWD: cold winter day (contextual group), H: happy (contextual group), S: sad (contextual group), B: busy (contextual group), R: relax (contextual group), APA: after physical activities (contextual group), G: generic group)

Figure 5 :
Figure 5: XGBoost feature importance of all feature model

Table 2 :
Tukey's Honest Significant Difference (HSD) test result demonstration on preference of 75 recipes

Table 3 :
Recipe mean rating among different FSA health levels for each contextual scenario group The abbreviations are the same as in Figure3.

Table 4 :
Tukey's Honest Significant Difference (HSD) test result demonstration on preference of healthy recipes (only statistically significant results are reported) The abbreviations are the same as in Table2and Figure3.

Table 5 :
XGBoost test results summarisation and comparison of all feature model

Table 6 :
XGBoost model test results summarisation and comparison among each contextual scenario group

Table 7 :
Summary of top 5 most important features for each contextual scenario model