Control Flow

最終更新日:2024-10-12 | ページの編集

所要時間: 65分

概要

質問

  • How can I make data-dependent choices in R?
  • How can I repeat operations in R?

目的

  • Write conditional statements with if...else statements and ifelse().
  • Write and understand for() loops.

コードを書く際、実行の流れを制御する必要がよくあります。 これは、ある条件、または一連の条件が満たされたときに実行されるようにすればできます。 あるいは、決まった回数実行されるよう設定することもできます。

There are several ways you can control flow in R. For conditional statements, the most commonly used approaches are the constructs:

R

~~~ # if if (condition is true) {
  perform action
} # if ... else if (condition is true) { # 条件が満たされた場合 アクションを行う } else { # つまり、条件が満たされなかった場合 別のアクションを行う } ~~~

例えばRに、もし変数 x が特定の値を持っていた場合、メッセージを表示させたいとします。

R

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
}

x

出力

[1] 8

The print statement does not appear in the console because x is not greater than 10. To print a different message for numbers less than 10, we can add an else statement.

R

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else {
  print("x is less than 10")
}

出力

[1] "x is less than 10"

else if を使うと、複数の条件を試すこともできます。

R

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else if (x > 5) {
  print("x is greater than 5, but less than 10")
} else {
  print("x is less than 5")
}

出力

[1] "x is greater than 5, but less than 10"

Important: when R evaluates the condition inside if() statements, it is looking for a logical element, i.e., TRUE or FALSE. This can cause some headaches for beginners. 例えば:

R

x  <-  4 == 3
if (x) {
  "4 equals 3"
} else {
  "4 does not equal 3"
}

出力

[1] "4 does not equal 3"

ここで見られるように、ベクトル x が FALSE であるため、不等号のメッセージが表示されました。

R

x <- 4 == 3
x

出力

[1] FALSE

チャレンジ1

Use an if() statement to print a suitable message reporting whether there are any records from 2002 in the gapminder dataset. Now do the same for 2012.

We will first see a solution to Challenge 1 which does not use the any() function. We first obtain a logical vector describing which element of gapminder$year is equal to 2002:

R

gapminder[(gapminder$year == 2002),]

Then, we count the number of rows of the data.frame gapminder that correspond to the 2002:

R

rows2002_number <- nrow(gapminder[(gapminder$year == 2002),])

The presence of any record for the year 2002 is equivalent to the request that rows2002_number is one or more:

R

rows2002_number >= 1

Putting all together, we obtain:

R

if(nrow(gapminder[(gapminder$year == 2002),]) >= 1){
   print("Record(s) for the year 2002 found.")
}

All this can be done more quickly with any(). The logical condition can be expressed as:

R

if(any(gapminder$year == 2002)){
   print("Record(s) for the year 2002 found.")
}

次のような警告メッセージをもらった人はいますか?

エラー

Error in if (gapminder$year == 2012) {: the condition has length > 1

The if() function only accepts singular (of length 1) inputs, and therefore returns an error when you use it with a vector. The if() function will still run, but will only evaluate the condition in the first element of the vector. Therefore, to use the if() function, you need to make sure your input is singular (of length 1).

Tip: Built in ifelse() function

R accepts both if() and else if() statements structured as outlined above, but also statements using R’s built-in ifelse() function. This function accepts both singular and vector inputs and is structured as follows:

R

# ifelse function
ifelse(condition is true, perform action, perform alternative action)

where the first argument is the condition or a set of conditions to be met, the second argument is the statement that is evaluated when the condition is TRUE, and the third statement is the statement that is evaluated when the condition is FALSE.

R

y <- -3
ifelse(y < 0, "y is a negative number", "y is either positive or zero")

出力

[1] "y is a negative number"

ヒント:any()all()

any() 関数は、ベクトルの中に少なくとも1つ TRUE の値がある場合、 TRUE を返し、 そうでない場合は、 FALSE を返します。 これは、 %in% 演算子でも同様に使えます。 関数 all() は、その名前が示唆しているように、ベクトル内の全ての値が TRUE である時のみ、 TRUE となります。

繰り返し行う処理


If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a for() loop will do the job. We saw for() loops in the shell lessons earlier. This is the most flexible of looping operations, but therefore also the hardest to use correctly. In general, the advice of many R users would be to learn about for() loops, but to avoid using for() loops unless the order of iteration is important: i.e. the calculation at each iteration depends on the results of previous iterations. If the order of iteration is not important, then you should learn about vectorized alternatives, such as the purrr package, as they pay off in computational efficiency.

for() ループの基本構造は:

R

for (iterator in set of values) {
  do a thing
}

例えば:

R

for (i in 1:10) {
  print(i)
}

出力

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

1:10 の部分は、ベクトルをその場で作るものです。 他のベクトルの中身を繰り返すこともできます。

for() ループを、もうひとつの for() ループと入れ子となる形にすれば、 2つ同時に繰り返すこともできます。

R

for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
    print(paste(i,j))
  }
}

出力

[1] "1 a"
[1] "1 b"
[1] "1 c"
[1] "1 d"
[1] "1 e"
[1] "2 a"
[1] "2 b"
[1] "2 c"
[1] "2 d"
[1] "2 e"
[1] "3 a"
[1] "3 b"
[1] "3 c"
[1] "3 d"
[1] "3 e"
[1] "4 a"
[1] "4 b"
[1] "4 c"
[1] "4 d"
[1] "4 e"
[1] "5 a"
[1] "5 b"
[1] "5 c"
[1] "5 d"
[1] "5 e"

We notice in the output that when the first index (i) is set to 1, the second index (j) iterates through its full set of indices. Once the indices of j have been iterated through, then i is incremented. This process continues until the last index has been used for each for() loop.

結果を表示させずに、ループの結果を新しいオブジェクトに書き込むこともできます。

R

output_vector <- c()
for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
    temp_output <- paste(i, j)
    output_vector <- c(output_vector, temp_output)
  }
}
output_vector

出力

 [1] "1 a" "1 b" "1 c" "1 d" "1 e" "2 a" "2 b" "2 c" "2 d" "2 e" "3 a" "3 b"
[13] "3 c" "3 d" "3 e" "4 a" "4 b" "4 c" "4 d" "4 e" "5 a" "5 b" "5 c" "5 d"
[25] "5 e"

このアプローチが役に立つこともありますが、‘結果を太らせる’ (結果のオブジェクトを 徐々に積み上げる)と、演算する上で非効率になります。 ゆえに、多くの値の間を繰り返すときは避けましょう。

ヒント:結果を太らせないようにしましょう

One of the biggest things that trips up novices and experienced R users alike, is building a results object (vector, list, matrix, data frame) as your for loop progresses. Computers are very bad at handling this, so your calculations can very quickly slow to a crawl. It’s much better to define an empty results object before hand of appropriate dimensions, rather than initializing an empty object without dimensions. So if you know the end result will be stored in a matrix like above, create an empty matrix with 5 row and 5 columns, then at each iteration store the results in the appropriate location.

よりよい方法は、(空の)出力オブジェクトを、値を埋める前に宣言することです。 この例では、より複雑に見えますが、より効率的です。

R

output_matrix <- matrix(nrow = 5, ncol = 5)
j_vector <- c('a', 'b', 'c', 'd', 'e')
for (i in 1:5) {
  for (j in 1:5) {
    temp_j_value <- j_vector[j]
    temp_output <- paste(i, temp_j_value)
    output_matrix[i, j] <- temp_output
  }
}
output_vector2 <- as.vector(output_matrix)
output_vector2

出力

 [1] "1 a" "2 a" "3 a" "4 a" "5 a" "1 b" "2 b" "3 b" "4 b" "5 b" "1 c" "2 c"
[13] "3 c" "4 c" "5 c" "1 d" "2 d" "3 d" "4 d" "5 d" "1 e" "2 e" "3 e" "4 e"
[25] "5 e"

ヒント:while ループ

時には、ある条件が満たされるまで繰り返す必要があります。 これは、 while() ループを使えばできます。

R

while(this condition is true){
  do a thing
}

R will interpret a condition being met as “TRUE”.

while(this condition is true) \~\~\~ 例として、このwhileループは 一様分布(&#x60;runif()&#x60; 関数)から0.1よりも小さい数を得るまで、 0から1の間で乱数を生成します。

R

z <- 1
while(z > 0.1){
  z <- runif(1)
  cat(z, "\n")
}

while() loops will not always be appropriate. You have to be particularly careful that you don’t end up stuck in an infinite loop because your condition is always met and hence the while statement never terminates.

チャレンジ2

Compare the objects output_vector and output_vector2. Are they the same? If not, why not? How would you change the last block of code to make output_vector2 the same as output_vector?

We can check whether the two vectors are identical using the all() function:

R

all(output_vector == output_vector2)

However, all the elements of output_vector can be found in output_vector2:

R

all(output_vector %in% output_vector2)

and vice versa:

R

all(output_vector2 %in% output_vector)

therefore, the element in output_vector and output_vector2 are just sorted in a different order. This is because as.vector() outputs the elements of an input matrix going over its column. Taking a look at output_matrix, we can notice that we want its elements by rows. The solution is to transpose the output_matrix. We can do it either by calling the transpose function t() or by inputting the elements in the right order. The first solution requires to change the original

R

output_vector2 <- as.vector(output_matrix)

into

R

output_vector2 <- as.vector(t(output_matrix))

The second solution requires to change

R

output_matrix[i, j] <- temp_output

into

R

output_matrix[j, i] <- temp_output

チャレンジ3

gapminder データを大陸ごとにループし、平均余命が50歳以上かどうかを表示する スクリプトを書きましょう。

Step 1: We want to make sure we can extract all the unique values of the continent vector

R

gapminder <- read.csv("data/gapminder_data.csv")
unique(gapminder$continent)

Step 2: We also need to loop over each of these continents and calculate the average life expectancy for each subset of data.gapminder <- read.csv("data/gapminder\_data.csv") unique(gapminder$continent) \~\~\~ {: .language-r} 手順2 :これらの大陸のそれぞれにループをし、その &#x60;部分集合&#x60; データごとに平均余命を出す必要があります。

  1. それは次のようにすればできます: 1.
  2. ‘大陸(continent)’ の固有の値のそれぞれについてループする 2.
  3. Return the calculated life expectancy to the user by printing the output:

R

for (iContinent in unique(gapminder$continent)) {
  tmp <- gapminder[gapminder$continent == iContinent, ]
  cat(iContinent, mean(tmp$lifeExp, na.rm = TRUE), "\n")
  rm(tmp)
}

Step 3: The exercise only wants the output printed if the average life expectancy is less than 50 or greater than 50. ゆえに、結果を表示させる前に if 条件をつけて、演算された平均余命が基準値以上か、基準値未満かを判別し、結果によって正しい出力を表示させる必要があります。 これを踏まえて、上の (3) を修正する必要があります: 3a.

3a. If the calculated life expectancy is less than some threshold (50 years), return the continent and a statement that life expectancy is less than threshold, otherwise return the continent and a statement that life expectancy is greater than threshold:

R

thresholdValue <- 50

for (iContinent in unique(gapminder$continent)) {
   tmp <- mean(gapminder[gapminder$continent == iContinent, "lifeExp"])

   if (tmp < thresholdValue){
       cat("Average Life Expectancy in", iContinent, "is less than", thresholdValue, "\n")
   } else {
       cat("Average Life Expectancy in", iContinent, "is greater than", thresholdValue, "\n")
   } # end if else condition
   rm(tmp)
} # end for loop

チャレンジ4

チャレンジ3のスクリプトをそれぞれの国ごとにループする形に直してください。 今回は、平均余命は50歳未満か、50歳以上70歳未満か、70歳以上かを 表示しましょう。

We modify our solution to Challenge 3 by now adding two thresholds, lowerThreshold and upperThreshold and extending our if-else statements:

R

チャレンジ4の解答 チャレンジ3の解答を、 `lowerThreshold``upperThreshold` の2つの基準値を加え、if-else 宣言を拡張する形で修正します: ~~~ lowerThreshold <- 50 upperThreshold <- 70 for( iCountry in unique(gapminder$country) ){ tmp <- mean(subset(gapminder, country==iCountry)$lifeExp) if(tmp < lowerThreshold){ cat("Average Life Expectancy in", iCountry, "is less than", lowerThreshold, "\\n") } else if(tmp lowerThreshold && tmp < upperThreshold){ cat("Average Life Expectancy in", iCountry, "is between", lowerThreshold, "and", upperThreshold, "\\n") } else{ cat("Average Life Expectancy in", iCountry, "is greater than", upperThreshold, "\\n") } rm(tmp) } ~~~ {: .language-r}

チャレンジ5 - 上級

Write a script that loops over each country in the gapminder dataset, tests whether the country starts with a ‘B’, and graphs life expectancy against time as a line graph if the mean life expectancy is under 50 years.

We will use the grep() command that was introduced in the Unix Shell lesson to find countries that start with “B.” Lets understand how to do this first. Following from the Unix shell section we may be tempted to try the following

R

grep("^B", unique(gapminder$country))

But when we evaluate this command it returns the indices of the factor variable country that start with “B.” To get the values, we must add the value=TRUE option to the grep() command:

R

grep("^B", unique(gapminder$country), value = TRUE)

We will now store these countries in a variable called candidateCountries, and then loop over each entry in the variable. Inside the loop, we evaluate the average life expectancy for each country, and if the average life expectancy is less than 50 we use base-plot to plot the evolution of average life expectancy using with() and subset():

R

thresholdValue <- 50
candidateCountries <- grep("^B", unique(gapminder$country), value = TRUE)

for (iCountry in candidateCountries) {
    tmp <- mean(gapminder[gapminder$country == iCountry, "lifeExp"])

    if (tmp < thresholdValue) {
        cat("Average Life Expectancy in", iCountry, "is less than", thresholdValue, "plotting life expectancy graph... \n")

        with(subset(gapminder, country == iCountry),
                plot(year, lifeExp,
                     type = "o",
                     main = paste("Life Expectancy in", iCountry, "over time"),
                     ylab = "Life Expectancy",
                     xlab = "Year"
                     ) # end plot
             ) # end with
    } # end if
    rm(tmp)
} # end for loop

まとめ

  • Use if and else to make choices.
  • Use for to repeat operations.