자체 열 SD를 기반으로 파일에서 행을 삭제하는 방법은 무엇입니까?

Question 1

$ awk 'NR==FNR{ s+=$3; ss+=$3^2; nr=NR; next }
       FNR==1 { mean=s/nr; sd=sqrt(ss/nr-mean^2) }
       $3> mean+(0.5*sd)' infile infile
B y 100
D ua 80

Answer

$ awk 'NR==FNR{ s+=$3; ss+=$3^2; nr=NR; next }
       FNR==1 { mean=s/nr; sd=sqrt(ss/nr-mean^2) }
       $3> mean+(0.5*sd)' infile infile
B y 100
D ua 80

Question 2

Python에서 이것을 사용하는 것이 좋습니다.

import pandas as pd
import numpy as np

# Read the input file into a pandas DataFrame
input = pd.read_csv('file', delimiter=' ', header=None)

# Calculate the mean and standard deviation
mean = input[2].mean()
sd = input[2].std()

# Filter rows based on the condition
filtered = df[df[2] > mean + 0.5 * sd]

# Write the filtered DataFrame to an output file
filtered.to_csv('outfile', sep=' ', header=False, index=False)

이 코드는 파일을 pandas DataFrame으로 읽고, 평균과 표준 편차를 계산하고, 조건에 따라 행을 필터링하고, 마지막으로 필터링된 DataFrame을 출력 파일에 씁니다.

Python을 처음 사용하는 경우 다음 명령을 복사하여 붙여넣을 수 있습니다.온라인 Python 코드 편집기.

## Prepare your input file
lines = ['A x 50', 'B y 100', 'C q 34', 'D ua 80']

with open('file', 'w') as file:
    for line in lines:
        file.write(line + '\n')
        
## The command to remove rows from input based on its SD       
import pandas as pd
import numpy as np

# Read the input 
input = pd.read_csv('file', delimiter=' ', header=None)

# Calculate the mean and standard deviation
mean = input[2].mean()
sd = input[2].std()

# Filter rows based on your condition
fileout = input[input[2] > mean + 0.5 * sd]

# Print the output
print(fileout)

이 명령은 입력 데이터를 준비합니다.

Answer

Python에서 이것을 사용하는 것이 좋습니다.

import pandas as pd
import numpy as np

# Read the input file into a pandas DataFrame
input = pd.read_csv('file', delimiter=' ', header=None)

# Calculate the mean and standard deviation
mean = input[2].mean()
sd = input[2].std()

# Filter rows based on the condition
filtered = df[df[2] > mean + 0.5 * sd]

# Write the filtered DataFrame to an output file
filtered.to_csv('outfile', sep=' ', header=False, index=False)

이 코드는 파일을 pandas DataFrame으로 읽고, 평균과 표준 편차를 계산하고, 조건에 따라 행을 필터링하고, 마지막으로 필터링된 DataFrame을 출력 파일에 씁니다.

Python을 처음 사용하는 경우 다음 명령을 복사하여 붙여넣을 수 있습니다.온라인 Python 코드 편집기.

## Prepare your input file
lines = ['A x 50', 'B y 100', 'C q 34', 'D ua 80']

with open('file', 'w') as file:
    for line in lines:
        file.write(line + '\n')
        
## The command to remove rows from input based on its SD       
import pandas as pd
import numpy as np

# Read the input 
input = pd.read_csv('file', delimiter=' ', header=None)

# Calculate the mean and standard deviation
mean = input[2].mean()
sd = input[2].std()

# Filter rows based on your condition
fileout = input[input[2] > mean + 0.5 * sd]

# Print the output
print(fileout)

이 명령은 입력 데이터를 준비합니다.

자체 열 SD를 기반으로 파일에서 행을 삭제하는 방법은 무엇입니까?

답변1

답변2

관련 정보