시뮬레이션된 값을 사용하여 두 열의 순서를 동시에 1000번 무작위로 변경하는 방법은 무엇입니까?

Question 1

모든 경우에 | column -t다음을 추가하여 출력을 시각적으로 정렬하세요.

1) 난수를 포함하는 "simulation1"과 "simulation2"라는 두 개의 열을 만들어야 합니다.

$ cat tst.awk
BEGIN { srand(seed) }
{ print $0, r(), r() }
function r() { return rand() * 100001 / 1000 }

$ awk -f tst.awk file | column -t
231   0.12    85.5574  23.7444
432   0.32    23.558   65.5853
11    0.0003  59.2486  50.3799
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
100   0.331   88.0281  17.4515
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
998   -3.001  87.888   31.97

2) 그 후 "simulation1" 열의 값을 기준으로 "ID" 및 "pheno" 열을 정렬합니다.

$ awk -f tst.awk file | sort -k3,3n | column -t
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
432   0.32    23.558   65.5853
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
11    0.0003  59.2486  50.3799
231   0.12    85.5574  23.7444
998   -3.001  87.888   31.97
100   0.331   88.0281  17.4515

3) 그런 다음 행의 처음 40%에 대한 평균 "pheno"를 계산합니다.

$ cat tst2.awk
{ vals[NR] = $2 }
END {
    max = NR * 40 / 100
    for (i=1; i<=max; i++) {
        sum += vals[i]
    }
    print sum / max
}

$ awk -f tst.awk file | sort -k3,3n | awk -f tst2.awk
0.36

나머지는 당신이 알아낼 수 있기를 바랍니다. 위에서는 출력이 동일하게 유지되도록 모든 호출에 동일한 시드를 awk에 제공하여 전체 계산 단계를 더 쉽게 추적할 수 있도록 했습니다. 호출마다 다른 난수가 생성 tst.awk되도록 호출을 로 변경합니다 .awk -v seed="$RANDOM" -f tst.awk

Answer

모든 경우에 | column -t다음을 추가하여 출력을 시각적으로 정렬하세요.

1) 난수를 포함하는 "simulation1"과 "simulation2"라는 두 개의 열을 만들어야 합니다.

$ cat tst.awk
BEGIN { srand(seed) }
{ print $0, r(), r() }
function r() { return rand() * 100001 / 1000 }

$ awk -f tst.awk file | column -t
231   0.12    85.5574  23.7444
432   0.32    23.558   65.5853
11    0.0003  59.2486  50.3799
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
100   0.331   88.0281  17.4515
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
998   -3.001  87.888   31.97

2) 그 후 "simulation1" 열의 값을 기준으로 "ID" 및 "pheno" 열을 정렬합니다.

$ awk -f tst.awk file | sort -k3,3n | column -t
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
432   0.32    23.558   65.5853
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
11    0.0003  59.2486  50.3799
231   0.12    85.5574  23.7444
998   -3.001  87.888   31.97
100   0.331   88.0281  17.4515

3) 그런 다음 행의 처음 40%에 대한 평균 "pheno"를 계산합니다.

$ cat tst2.awk
{ vals[NR] = $2 }
END {
    max = NR * 40 / 100
    for (i=1; i<=max; i++) {
        sum += vals[i]
    }
    print sum / max
}

$ awk -f tst.awk file | sort -k3,3n | awk -f tst2.awk
0.36

나머지는 당신이 알아낼 수 있기를 바랍니다. 위에서는 출력이 동일하게 유지되도록 모든 호출에 동일한 시드를 awk에 제공하여 전체 계산 단계를 더 쉽게 추적할 수 있도록 했습니다. 호출마다 다른 난수가 생성 tst.awk되도록 호출을 로 변경합니다 .awk -v seed="$RANDOM" -f tst.awk

Question 2

업데이트된 스크립트는 숫자 bc의 선행 기호와 잘 작동하지 않아 .awkdomath

또한 shuf고정 배열을 사용하는 것이 더 간단하므로 각 반복마다 섞기 위해 배열 인덱스를 사용하도록 변경되었습니다.

#!/bin/bash

function domath {
    #do the math using the 4 indices into the pheno array
    awk '{print ($1+$2+$3+$4)/4}' <<<"${ph[$1]} ${ph[$2]} ${ph[$3]} ${ph[$4]}"
}

function iterate {
    #randomise the indices and get the first 4
    shuf -e 0 1 2 3 4 5 6 7 8 9 | head -n 4
}

#number of iterations
nits=100

#read the pheno values into an array
ph=($(tail -n +3 data | awk '{print $2}'))


echo -e row'\t'sim1'\t'sim2'\t'diff
for (( row=1; row<=$nits; row++ )); do
    #calculate simulation1 
    first=$(printf "%+.3f" $(domath $(iterate)))
    #calculate simulation 2
    second=$(printf "%+.3f" $(domath $(iterate)))
    #calculate the difference
    diff=$(printf "%+.3f" $(awk '{print $2-$1}' <<<"$first $second"))
    #and print
    echo -e $row'\t'$first'\t'$second'\t'$diff
done

Answer

업데이트된 스크립트는 숫자 bc의 선행 기호와 잘 작동하지 않아 .awkdomath

또한 shuf고정 배열을 사용하는 것이 더 간단하므로 각 반복마다 섞기 위해 배열 인덱스를 사용하도록 변경되었습니다.

#!/bin/bash

function domath {
    #do the math using the 4 indices into the pheno array
    awk '{print ($1+$2+$3+$4)/4}' <<<"${ph[$1]} ${ph[$2]} ${ph[$3]} ${ph[$4]}"
}

function iterate {
    #randomise the indices and get the first 4
    shuf -e 0 1 2 3 4 5 6 7 8 9 | head -n 4
}

#number of iterations
nits=100

#read the pheno values into an array
ph=($(tail -n +3 data | awk '{print $2}'))


echo -e row'\t'sim1'\t'sim2'\t'diff
for (( row=1; row<=$nits; row++ )); do
    #calculate simulation1 
    first=$(printf "%+.3f" $(domath $(iterate)))
    #calculate simulation 2
    second=$(printf "%+.3f" $(domath $(iterate)))
    #calculate the difference
    diff=$(printf "%+.3f" $(awk '{print $2-$1}' <<<"$first $second"))
    #and print
    echo -e $row'\t'$first'\t'$second'\t'$diff
done

시뮬레이션된 값을 사용하여 두 열의 순서를 동시에 1000번 무작위로 변경하는 방법은 무엇입니까?

답변1

답변2

관련 정보