FAQs
Other student questions from previous years (unsorted)
Gender Imbalance Effect on t-test
Student question (2021): I wanted to ask a general question related to both the lecture and practical session: how much does gender imbalance influence the results of a t-test? Is there a technique to test for this possible effect and, in this case, how could one account for the imbalance?
Answer given:
In general, an imbalanced dataset can be used for a t-test as long as the assumptions of the test are met and there are enough datapoints in each of the two compared groups (but in theory the test also works with very small sample sizes).
The assumptions of the t-test are that the datapoints are independently and identically distributed following a normal distribution. So as long as there is no evidence for a violation of these assumptions, you can use the test (you will see in the coming weeks how to check whether the assumptions are met or not!).
By the way, usually the t-test assumes equal variances between the two groups. If they are not (and you set var.equal=FALSE in R) the test will have to estimate the variance in both groups and hence there will be fewer degrees of freedom left.
Regarding how to counteract data imbalance, one possible option could be to create a balanced dataset by randomly subsampling from the more abundant group. But as I’ve said, it is not needed in this case.
More stuff to think about: it is maybe even more important to think about whether the data is representative about the population that we want to infer something from. I do not know what the gender ratio is in the class, but it could be that the data imbalance is there because one gender was more willing to participate than the other. If that is the case, then there could be a confounding variable that influenced both the outcome (the reaction time) and the willingness to participate. For instance it could be that in one gender only persons with a fast reaction time participated, while in the other group everyone participated regardless of reaction times and thus the dataset would not be representative of the actual class and the t-test result no longer valid.
Imbalanced: means that the two groups that are compared consist of an unequal number of datapoints
How to get extreme values
Student question (2021):
In Unit2 Question3 Exercise1 it is written that it would be complicated to progamatically check if the extreme values belong to one single individual. I think I have found a way to do so:
extremes <- slice(df,-c(1:252)) #generate empty data frame for (i in colnames(df)) { print(filter(df,df[i]==max(df[i]))) } df is my dataframe.
So my question is if this is a correct approach and if it would lead to the correct answer. And my second question is, how can I add the output of the statement to the empty data frame I created above? In python I would solve it with a list or something similar but I am not quite sure how I should or can do it in R.
Looking forward to an answer.
Answer given:
You found a good start to do it! Here is how you could add it to the dataframe:
extremes <- df %>% slice(0) for (i in colnames(df)) { max <- filter(df,df[i]==max(df[i])) extremes <- rbind(extremes, max) } rbind() stands for “row bind” and does just that: it binds rows together. There is also cbind(), for columns.
Note: It is worth pointing out that in this way you “only” get the maximum value of every given variable. However, It is a matter of how extremes are defined: it could be that the max value of a variable is not an extreme because it is perfectly in line with the other values. Similarly, there could be more than 1 extreme value for a given variable, with respect to the other values. The same goes for small values, as those can be extremes as well.
More info: a common definition of extreme values is: smaller than the first quartile minus 1.5IQR (the interquartile range) OR larger than the third quartile plus 1.5IQR (the interquartile range). The outliers in boxplots are for instance calculated in this way. This is not needed for what we are doing in the course and what you did is perfectly appropriate!
Lecture 3 linear regression Hypothesis and p value
Student question (2021): I am a bit confused… In lecture 3 on slide 31 it is written: „In included is the assumption that the data follow the simple linear regression model.“ Then on slide 35 is is written that such a little p-value suggests that it is very unlikely to see such an slope if there would not be a correlation. But I have heard several times before this course that a small p-value suggests to reject the Null Hypothesis. Therefore I am confused because shouldn’t the Null Hypothesis be that there is no correlation and because of the small p-value we accept because there is a correlation?
Can somebody help me?
Answer given: I think that you understood it correctly, but you just got confused by that sentence on slide 31.
What is meant with “In H0 included is the assumption that the data follow the simple linear regression model” is that in addition to H0 there is the assumption that the data can be analyzed with the chosen regression model. It can be analyzed like this if the modelling assumptions (see slide 25 in lecture 3) are met (you will see in the coming weeks how to check whether they are met or not). Only if the modelling assumptions are met it makes sense to test the actual H0.
As you correctly understood, the actual Null hypothesis H0 is that the slope beta = 0. At the same time, if the slope is 0 it also means that the correlation would be 0. But for completeness sake I also want to mention that there is a difference between the regression slope and the correlation, which is nicely explained here.
Plotting Regression line
Student question (2021): dear BIO144 team, I tried to graph my results using code below. However, when I tried to draw the regression line using “geom_smooth” ,it gives me 5 different regression line in different colors for each country (could not upload the graph here). What if I want only one regression line for all my data, and at the same time different colors for the continents points?
ggplot(health\_filtered, aes(x=logExp, y=logMort, colour=continent))+
geom\_point(size=0.8, alpha=0.9)+
geom\_smooth(method="lm", se= FALSE)+
xlab("Log of Child Mortality")+
ylab("Log of healthcare Expenditure")+
ylim(0,3)+
xlim(1,4)+
theme\_light()
Answer given: You have specified the aesthetic mapping colour = continent in the ggplot function. Any aesthetic mapping specified in the ggplot function are inherited by all other geoms. I.e. they apply automatically to all other geoms, unless we say otherwise. So your geom_smooth is also using the colour = continent aesthetic mapping, and so is doing a separate regression for each.
If we want different colours for each point, but one regression, then here are two solutions:
- Remove the colour = continent aesthetic mapping from the ggplot function, and put it only in the geom_point function.
OR
- Remove the inherited colour = continent aesthetic mapping from the geom_smooth by adding mapping = aes(colour = NULL). i.e. use geom_smooth(method=“lm”, mapping = aes(colour = NULL))
Interpreting the table lecture 4 page 29 2021
Student question (2021):
The conclusions I get from interpreting the table is that
- bmi has a strong correlation with bodyfat (steep slope)
- age probably doesn’t have that strong of an influence on bodyfat (only 0.13 slope)
- the p-values all show significance but in the case of age it might not play an important biological role
- the confidence interval of the intercept is rather wide so it could be useful to generate more data points (?)
Did I miss anything or did I get it wrong?
Answer given:
Hi, solid interpretation! Some comments:
- You rise a good point of statistical significance versus effect size (and corresponding clinical significance, in this case). It can happen that a coefficient is estimated to be significantly different from the null effect, but at a same time the estimated coefficient is so small (i.e. small effect size) that it does not matter.
- However, it is not always so easy to determine whether an effect size is big or small and in the case of the covariates age and bmi I argue that both might be important, and here is why. At first glance the two estimated slopes are very different, with the one for bmi being roughly 14 times as big as the one for age. Yet, age is a continuous variable and the slope means that for every year that passes the bodyfat percentage is estimated to increase by 0.13. Hence, after roughly 14 years the bodyfat percentage is predicted to increase by 1.82, roughly the same as if the bmi would have increased by 1. What I want to say is that 0.13 might not seem much, but it adds up over the years. On the other hand, an increase of 1 of the bmi is actually not so little and thus it might be less likely to happen.
Regarding the intercept:
- In general more data would increase the precision of the estimates (i.e. decrease the standard errors), but it is often not possible to have more data.
- Think about what the intercept is in this model. Are we interested in it at all? The answer is “probably not”: it is the bodyfat percentage for when age=0 and bmi=0, both of which are not possible. This is also why the estimated intercept does not make sense (-31 bodyfat percentage?). In this model we are only interested in the estimated slopes.
In any case, good start with interpretation! You could further try to interpret the confidence intervals (in the sense: what do they mean?).
More info: related to the point above, another thing to keep in mind when interpreting coefficients are the units of the variables. For age it’s years, so as I wrote above one unit increase in age (i.e. 1 more year) results in an expected increase of bodyfat percentage by 0.13. If however the unit of the variable age was decades instead of years, then the slope would have been estimated to be (about) 100.13=1.3 and in this case it would have roughly been the same as for bmi* and our first “impression” of its size would have been different.
Overdispersion Index
Student question (2021): In the GSWR book it says only to worry about dispersion if the dispersion index is above 2, but in the mock exam it said that there is overdispersion even though the index was below 2. Does that mean there is always overdispersion if the index is above 1 but you just don’t worry about it?
Answer given: Hi. Yes if the the dispersion parameter is >1 it is overdispersion, but how much larger than 1 it has to be to become problematic is up for debate. I cannot put it better than what is written in the book:
“What if the dispersion index had been 1.2, 1.5, 2.0, or even 10? When should you start to worry about overdispersion? That is a tricky question to answer. One common rule of thumb is that when the index is greater than 2, it is time to start worrying. Like all rules of thumb, this is only meant to be used as a rough guide. In reality, the worry threshold depends on things like sample size and the nature of the overdispersion.”
IC10 Abalone - quasipoisson and cor
Student question (2021): I was wondering why there’s no AIC when doing the dropterm function for a full model with family=quasipoisson? dropterm(full_glm_quasi, sorted=TRUE)
Also I did not quite understand the last line in the sample solution script:
cor(cbind(x=fitted(noViscera), y=fitted(lm\_model)))
how does this work and what does the result mean?
Answer given: The short answer is this: you might remember that to be calculated, the AIC needs the likelihood. I don’t think that we formally defined the likelihood in this course, so it’s ok if you do not really know what it is. What is important here is that for the GLM with quasi-Poisson the likelihood is no longer defined and thus the AIC cannot be calculated. I do not actually know whether dropterm is useful in this case (note that the exercise does not ask you to use it with the quasi-Poisson model, it only asks you to change the selected Poisson model to a quasi-Poisson model).
Regarding the second question: with
`cor(cbind(x=fitted(noViscera), y=fitted(lm\_model)))`
a matrix with two columns is passed to cor, and from the help page of cor we know that in this case the correlations between the columns are calculated (x vs x, x vs y, y vs x and y vs y, hence 4 values). The values on the diagonal of the produced correlation matrix are obviously 1, because we correlated respectively x vs x and y vs y. The values on the off-diagonals are the one of interest and are the same because x vs y and y vs x is the same. As there are only 2 columns in this case, I find it easier to just use cor(fitted(noViscera), fitted(lm\_model)), i.e. to just pass the 2 vectors separately, in which case only the correlation between the two vectors is calculated. As the correlation between the fitted values of the two models is almost 1 (it is 0.9733156) you rightfully state that there is no practical difference in the predicted values between the two models (but the glm is still the model that should be used because the response is count data).
Exercise 7c correlation between parameters
Student question (2021): the scatter plot of the Betas indicates a possible correlation between B0 and B1 as well as between B0 and B2. On the other hand, B1 and B2 do not seem to correlate. can you help me understand why.
Is it correct to interpret this the following way: keeping one point fixed (say B2), then the variation in the intercept will define the variation in B1?
Answer given: This is what I think is going on:
- First, B1 and B2 are not correlated because x1 and x2 are not correlated either, so their coefficients are independent from each other.
- Second, by adding random noise to the response y it can happen that the apparent relation between e.g. x1 and x weakens (slope B1 closer to 0) or strengthens (the opposite). At the same time because the data (x1) did not change if the slope (B1) changes then the intercept (B0) changes as well, otherwise the regression line would not fit the data. It’s not so easy to explain, maybe it helps if you try to draw it (i.e. add more noise and see what happens to the intercept and the slope). The same goes for B2 and B0.
Question about Linear Model from Practical 4
Student question (2021):
I have a question about the practical we did in week 4, the one with the milk data set.
So there we used the following linear model:
lm(formula = kcal.per.g ~ neocortex.perc + mass, data = milkdat)
Now inf we check the table you gave us in the “useful table” section from Lesson 7, this would constitute Case D: same slope and different intercepts.
Now in the Practical from week 4 one question is:
What is the estimated slope of the relationship between kcal.per.g and mass?
And the answer is -0.0054
Second question is: What is the estimated slope of the relationship between kcal.per.g and neocortex.perc?
And the answer is 0.018
So obviously there’s a contradiction here. Because in the table it says if we use the “+” in the linear model, so a model without interaction, the slopes for the two relationships between explanatory variables and response variable should be the same. But this is not the case here.
So is there a mistake I do in thinking? I would be very glad if you could clear this up!
Answer given:
The “useful table” is based on an ANCOVA (topic of week 7), i.e. there is a continuous variable (density) and a categorical variable (season). The slope is for density and in the case of no interaction (i.e. a “+” between the explanatory variables) the slope associated with density is the same for all values of season. For the categorical variable, k-1 (k being the number of levels of the variable) intercepts are estimated.
In the milk example, neocortex.perc andmass are both continuous variables, meaning that for both a slope is estimated separately. Hence the questions are “What is the estimated slope of the relationship between kcal.per.g and mass?” and “What is the estimated slope of the relationship between kcal.per.g and neocortex.perc?”
Note that an interaction between two continuous variables would also be possible. It would mean that the slope of one continuous variable on the response variable changes as the values of a second continuous change.
Follow-up question:
So does this mean that if I have both continuous variables, if I do a linear model with “+” (no interaction) in the summary table I get one intercept for the alphabetically first explanatory variable, then the 2 slopes of the 2 explanatory variables? And in the case of ANCOVA (hence one continuous variable and one categorical variable) I get the intercept for the alphabetically first explanatory variable, and then the slope which is the same, and then the difference between the first intercept (the reference) and the intercept of the 2nd explanatory variable (alphabetically second)? Is this correct?
Answer given:
Almost… The intercept is not for the continuous variables. It would be what you say it is if you would have a categorical variable. I suggest that you try to write down the model to help you interpret the summary table until you’re more confident with it. In this case with y=kcal.per.g and x1=neocortex.perc and x2=mass the model is:
y_i =_0 + _1x_i^{(1)} + _2x_i^{(2)}+_i So you are estimating 1 intercept (beta0) and 2 slopes (one for each explanatory variable, beta1 and beta2). Note that the intercept is basically the value of y_i (minus the error) when both x1=0 and x2=0, that is when both the neocortex.perc and the mass is =0.
You could try to write down the model in the case you also have a categorical variable
ANOVA vs Multiple linear Regression
Student question (2021):
Hello I’m a bit confused when I have to test with an ANOVA and when with a multiple linear regression. For example: The Yield of Hybrid Mais example in lecture 6 compares the differences of means of the groups(categorical variable). The Earthworm example is also a categorical (explanatory) variable and there we test with a multiple linear regression.
(we convert the categorical variable into a dummy variable but still could I test there also with an ANOVA?)
Answer given:
Hi. ANOVA and (multiple) linear regression are both linear models that investigate the relation between explanatory variables and a response variable. In fact, ANOVA can be seen as a special case of linear regression in which the all explanatory variables are categorical.
In general, if there is a categorical variable (for instance, in a ANOVA) we do not directly look at the summary() table (which gives coefficient estimates and corresponding t-test results) but we look at the anova() table which gives the estimated SS, MS and corresponding F-test results. What is the difference? In the summary() table each coefficient is separately tested with a t-test against the Null hypothesis that it is 0. For a categorical variable with k levels this means k-1 tests (k-1 dummy variables) and if \(k\) is large we run into the multiple comparison problem. So, instead of testing each level of that categorical variable separately we run 1 single test (the F-test) that tells us whether at least 2 levels are different from each other (see slide 17 lecture 6). Afterwards we can look at the estimated coefficients with summary(). For continuous variables we can directly look at the summary() output. In the earthworm example, there were tow explanatory variables, 1 categorical (Gattung) and 1 continuous (Magenumf). For the former you first look at the anova()table, for the latter you can directly look at the summary() table.
Self test question 8 Spread and shape of distributions
Student question (2021):
I find the question 8 of Spread and shape of distributions in the self test not well formulated. The tail is the thinner part of the curve, so if there is a long tail of high values it could mean that the data are more concentrate in the lower part and therefore have a mena lower then the median.
Answer given:
Hi! I can understand that it can be confusing. In statistics, what is meant with “long tail of high values” (sometimes also called “fat tail” or “heavy tail”) is that there is a bigger probability of getting (very) large values when compared to the reference distribution (often a normal or an exponential distribution). See for instance here. So the sentence “A distribution of data with a long tail of high values” always means that high values are more frequent than they would be in the distribution of reference.
As you correctly identified, the mean is not a robust measure as its value is influenced by extreme values in the data, while the median is more robust. So in the case of a distribution of data with a long tail of high values, the mean is bigger than the median.
ggpair() graph exclusions
Student question (2021):
Dear BIO144 teammates. In unit 2, excercise 1, Question number 7, for looking at relationships among variables we use the code ggpairs(). However, as you all have seen, it gives us all the graphs. It has some cons including some of them are not needed for answering this question and also the graphs are so small that it is hard to see the relationship. Therefore I was wondering is there a way to exclude all the graphs rather than the ones that we need?(the ones that show the relationship of bodyfat with all other variables)
Answer given:
Hi! You could do it like this:
bodyfat\_dataset %>%
select(bodyfat, age, abdomen, height, weight) %>%
ggpairs()
Lecture4 page32
Student question (2021):
Could someone please look at our answers for the questions on this page? we are not sure if our answers are correct and which other interpretations are possible.
- x1 is an important explanatory variable because if only x1 is used, then R2 and the adjusted R2 are high and the p-value is small (what means that the slope for x1 is differs significantly from zero(?))
- same answer for x2
- In this model, we only need x1 or x2, not both. Reason: the R2 becomes not much higher if we use both compared to the situation where we only use one of them. x1 and x2 are also positively correlated: if x1 increases, then x2 will increase too.
- Interpretation: There is a positive correlation between y and each of the two x’s: if x1 or x2 increases, then the catheter length will increase too.
Thank you very much in advance for your answer.
Answer given:
- I think that your answer for questions 1 and 2 are fine (but see answer to question 3, which is relevant here as well).
- Question 3: you will see this later in the course (so for now it is completely fine to just answer these questions as good as you can and if you want to you can ask, like you did), but the answer to this question is less straightforward and depends on our goals. In general, in regression there are 2 goals: to predict and to explain.
- To predict means that we want to be able to predict the response variable y as good as we can, and we do not really care how we achieve this (i.e. which variables we use) and we might just look at the adjusted r-squared value and pick the model with the highest value. There is a mistake on slide 31 and the adjusted r-squared for the model with both x1 and x2 is not shown, but it is 0.76. So in this case we would probably just take the model with just x2 (adjusted r-squared: 0.78)
- To explain means that we are interested in the relation between the explanatory variables and the response variable (e.g. what does a unit increase in x1 mean for y?). In this context, we can for instance fit two separate models for the two explanatory variables, or if we are only interested in one of the two, we just use that one. As I said, you will see this again later in the course.
- Question 4: with this new information and what I already wrote above please try again on your own to interpret the model with both x1 and x2 (e.g. are the slopes estimated to be significant?).
I hope this clears things up a little bit. You will see that things will get clearer as you progress through the course as these things will come up more than once. But feel free to ask for further clarifications.
Anova Degrees of Freedom
Student question (2021):
Hello, I am quite confused about the degrees of freedom in Anova. In the BC reading it says that the degree of freedom is the number of groups one has. For the one-way anova example in the lecture, this would mean 4 so 20-4 = 16. However, in the slides it say the degree of freedom is n-1, which would result in 19. When should I use which method or in other words, when asked for the degrees of freedom of an Anova, which number would be expected at an exam? Thanks for your help!
Answer given: There is more than one type of degree of freedom involved, so first some theory: If you look at slide 18 lecture 6, you can see that it is the total variability
SS_{total} that has n-1 degrees of freedom (i.e. we need 1 degree of freedom to calculate it). In Anova, we partition this total variability into the explained variability by the model
SS_{between~groups} and into the residual variability
SS_{within~groups} That is:
SS_{total}= SS_{betweengroups}+SS_{withingroups} To calculate the explained variability we need g-1 degrees of freedom (with g the number of groups) which leaves us with n-1 -(g-1)=n-g degrees of freedom for the residual variability. We then test with the F-test whether the explained variability is significant. Under H0 the calculated test statistics
F= follows a F-distribution with degrees of freedom g-1 and n-g, i.e.
FF_{g-1,n-g} so that’s why we need those degrees of freedom.
Now, notice that the degrees of freedom used for the explained variability is g-1, i.e. one less than there are groups. In addition to this, the intercept of the model is estimated as well, which uses up another degree of freedom. Hence the total number of degrees of freedom used by the anova model is g-1+1=g, and the remaining degrees of freedom are n-g (or as in your example 20-4=16). Note that the intercept does not come into play in the variance decomposition for the F-test, hence it is not listed in ANOVA table on slide 18.
I hope this clears things up a little.
Unit 10 Abalone Age 1
Student question (2021):
I have a question concerning the interpretation of the data and I was a bit confused with the terms of over-dispersion and under-dispersion. In the interpretation of the data it says:
“So, interestingly, and rather unusually for biological data, the data is under-dispersed, and”This is another example of a model being anti-conservative.”
And in the summary table we can see quite low p-values.
But in the lecture slides (lecture 10, slide 38) we learned that “When there is unaccounted over-dispersion, the p-values that are calculated are usually too small!” And on slide 41, which talks about under-dispersion, it says “In that case, your p-values are usually too large, that is, the results are conservative”
So I don’t understand the correct connection between over/under-dispersion, low/high p-values and (anti-)/conservative.
Answer given:
It is just a matter of slightly unlucky placement: the sentence “This is another example of a model being anti-conservative.” comes after a horizontal line (denoting a new topic/paragraph/etc.) and is preceded by the sentence “So the poisson glm had fewer significant terms than the lm.” (Note: this refers to the old version on openedx, and not to the version on Olat). Hence it refers to that sentence and has nothing to do with whether the model is over- or underdispersed. In fact, there are many reasons for why something (p-value, confidence interval, etc.) can be (anti-)conservative, one being that a wrong model is used: the lm is the wrong model (because it’s count data) and it produced smaller p-value than the glm, thus in this case (!) the lm is anti-conservative. What is written about the dispersion parameter in the slides is correct.
Unit 10 Abalone Age 2
Student question (2021):
I have a question concerning the interpretation of the data and I was a bit confused with the terms of over-dispersion and under-dispersion. In the interpretation of the data it says:
“So, interestingly, and rather unusually for biological data, the data is under-dispersed, and”This is another example of a model being anti-conservative.”
And in the summary table we can see quite low p-values.
But in the lecture slides (lecture 10, slide 38) we learned that “When there is unaccounted over-dispersion, the p-values that are calculated are usually too small!” And on slide 41, which talks about under-dispersion, it says “In that case, your p-values are usually too large, that is, the results are conservative”
So I don’t understand the correct connection between over/under-dispersion, low/high p-values and (anti-)/conservative.
Answer given:
It is just a matter of slightly unlucky placement: the sentence “This is another example of a model being anti-conservative.” comes after a horizontal line (denoting a new topic/paragraph/etc.) and is preceded by the sentence “So the poisson glm had fewer significant terms than the lm.” (Note: this refers to the old version on openedx, and not to the version on Olat). Hence it refers to that sentence and has nothing to do with whether the model is over- or underdispersed. In fact, there are many reasons for why something (p-value, confidence interval, etc.) can be (anti-)conservative, one being that a wrong model is used: the lm is the wrong model (because it’s count data) and it produced smaller p-value than the glm, thus in this case (!) the lm is anti-conservative. What is written about the dispersion parameter in the slides is correct.
Chr vs Factors
Student question (2021):
When I import my data, I often have “chr” rather than “factor”. This was the case for country and continent in the healthcare_financing.csv.
Should I do something specific to get the data directly as factor? If not, is there a way to change all them at once, rather than to change them one after the other using as.factor?
Answer given:
As of R 4.0.0. variables are read into R as characters instead of factors. Ahead of this course it was decided that we keep it like this because linear models can be fitted with both types. So it is suggested that you do not change the variables to factors (in fact, the only time you might/will want to change them is when you want to change the levels within a factor… and for now at least this is not needed!). If nevertheless you want to change characters to factors, it might actually be good to do that one variable at a time as you will only convert only want you need to convert and it gives you more control over it.
PS: if you really want to convert all characters to factors, you can do this (but again, no need to do it!)
dd <- dd %>% mutate(across(where(is\_character),as\_factor))
Changing reference in earthworm video example
Student question (2021):
In the earthworm analysis of the correlation between log.gewicht and Gattung. Is it correct to think that if the reference Gattung had been “N” , which seems to have a similar mean than Oc, then the p-value in the linear regression model would only have been significant for L but would not have been significant for Oc? I am assuming that the means for N and Oc are statistically similar.
Answer given:
Yes, exactly. You can change the reference level with the following code, then run the model with that new Gattung variable, and check the summary table. (You must install the forcats package to use the fct_relevel function.)
library(forcats)
dd <- dd %>%
mutate(Gattung\_refN\_ = fct\_relevel(Gattung, "N", after = 0))
Decomposing R^2
Student question (2021):
I have a question concerning the decomposition of R^2. In the lecture we learned that we should calculate the relative importance with the package relaimpo and the function calc.relimp. However, in the IC material, we learned how to calculate it more manually (R^2 of model_both - R^2 of model_weight). These two methods do not produce the same result so when should we use which method? Or did I understand something incorrectly in general?
Answer given:
There is more than one way to calculate relative importances of variables and the various methods differ in the the produced results. If you will be asked to calculate them, you will be told which approach to use.
