실험 및 예시

Question

이 질문은 2012년 3월 오스틴 그룹 메일링 리스트에 제기되었습니다. 이 문제에 대한 마지막 말은 다음과 같습니다(이 문제를 처음 제기한 Austin 그룹(POSIX를 유지 관리하는 조직)의 Geoff Clare가 제기함). 다음은 gmane NNTP 인터페이스에서 복사되었습니다.

Date: Fri, 16 Mar 2012 17:09:42 +0000
From: Geoff Clare <gwc-7882/[email protected]>
To: austin-group-l-7882/[email protected]
Newsgroups: gmane.comp.standards.posix.austin.general
Subject: Re: Strange addressing issue in sed

Stephane Chazelas <[email protected]> wrote, on 16 Mar 2012:
>
> 2012-03-16 15:44:35 +0000, Geoff Clare:
> > I've been alerted to an odd behaviour of sed on certified UNIX
> > systems that doesn't seem to match the requirements of the
> > standard.  It concerns an interaction between the 'n' command
> > and address matching.
> > 
> > According to the standard, this command:
> > 
> > printf 'A\nB\nC\nD\n' | sed '1,3s/A/B/;1,3n;1,3s/B/C/'
> > 
> > should produce the output:
> > 
> > B
> > C
> > C
> > D
> > 
> > GNU sed does produce this, but certified UNIX systems produce this:
> > 
> > B
> > B
> > C
> > D
> > 
> > However, if I change the 1,3s/B/C/ to 2,3s/B/C/ then they produce
> > the expected output (tested on Solaris and HP-UX).
> > 
> > Is this just an obscure bug from common ancestor code, or is there
> > some legitimate reason why this address change alters the behaviour?
> [...]
> 
> I suppose the idea is that for the second 1,3cmd, line "1" has
> not been seen, so the 1,3 range is not entered.

Ah yes, now it makes sense, and it looks like the standard does
require this slightly strange behaviour, given how the processing
of the "two addresses" case is specified:

    An editing command with two addresses shall select the inclusive
    range from the first pattern space that matches the first address
    through the next pattern space that matches the second.  (If the
    second address is a number less than or equal to the line number
    first selected, only one line shall be selected.) Starting at the
    first line following the selected range, sed shall look again for
    the first address. Thereafter, the process shall be repeated.

It's specified this way because the addresses can be BREs, but if
the same matching process is applied to the line numbers (even though
they can only match at most once), then the 1,3 range on that last
command is never entered.

-- 
Geoff Clare <g.clare-7882/[email protected]>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

Jeff가 (나에게서) 인용한 메시지의 나머지 부분 중 관련 부분은 다음과 같습니다.

I suppose the idea is that for the second 1,3cmd, line "1" has
not been seen, so the 1,3 range is not entered.

Same idea as in

printf '%s\n' A B C | sed -n '1d;1,2p'

whose behavior differ in traditional (heirloom toolchest at
least) and GNU.

It's unclear to me whether POSIX wants one behavior or the
other.

따라서 (Geoff에 따르면) POSIX는분명한GNU는 불법적으로 행동합니다.

실제로 , 일관성이 덜합니다 seq 10 | sed -n '1d;1,2p'(seq 10 | sed -n '1d;/^1$/,2p'"이상한").

누구도 이것을 GNU 사람들에게 버그로 보고하고 싶어하지 않습니다. 이것을 버그라고 생각해야 할지 모르겠습니다. 아마도 가장 좋은 선택은 두 가지 동작을 모두 허용하도록 POSIX 사양을 업데이트하여 어느 쪽에도 의존할 수 없다는 점을 분명히 하는 것일 것입니다.

편집하다. 1970년대 후반 Unix V7의 원래 구현을 살펴보니 sed숫자 주소의 동작이 의도되지 않았거나 적어도 완전히 고려되지 않은 것 같습니다.

대신, Geoff가 사양을 읽은 후(그리고 그 일이 발생한 이유에 대한 원래 설명) 다음 위치에 있습니다.

seq 5 | sed -n '3d;1,3p'

라인 1, 2, 4, 5는 이번에는 1,3pranged 명령이 한 번도 만난 적이 없는 끝 주소이기 때문에 출력되어야 합니다.seq 5 | sed -n '3d;/1/,/3/p'

그러나 이것은 원래 구현에서는 발생하지 않으며 내가 시도한 다른 구현에서도 발생하지 않습니다(busybox는 sed버그처럼 보이는 행 1, 2 및 4를 반환합니다).

당신이 보면UNIX v7 코드, 현재 줄 번호가 다음인지 확인합니다.더 큰(번호) 끝 주소보다 범위가 벗어났습니다. 사실은시작 주소에 대해서는 이 작업을 수행하지 않습니다.의도한 디자인이라기 보다는 실수로 보입니다.

이는 현재 구현이 실제로 이 측면에 대한 POSIX 사양의 해석을 따르지 않는다는 것을 의미합니다.

GNU 구현의 또 다른 혼란스러운 동작은 다음과 같습니다.

$ seq 5 | sed -n '2d;2,/3/p'
3
4
5

2번째 줄은 건너뛰었기 때문에 2,/3/3번째 줄에 입력하세요(숫자가 2보다 큰 첫 번째 줄). 하지만 우리를 만드는 것은 바로 이 선입니다.입력하다범위, 확인되지 않음끝주소. 상황이 더욱 악화됩니다 busybox sed.

$ seq 10 | busybox sed -n '2,7d; 2,3p'
8

2~7행이 삭제되었으므로 8행은 첫 번째 행 >= 2이므로 2,3 범위는 다음과 같습니다.입력하다그 다음에!

Answer 1

이 질문은 2012년 3월 오스틴 그룹 메일링 리스트에 제기되었습니다. 이 문제에 대한 마지막 말은 다음과 같습니다(이 문제를 처음 제기한 Austin 그룹(POSIX를 유지 관리하는 조직)의 Geoff Clare가 제기함). 다음은 gmane NNTP 인터페이스에서 복사되었습니다.

Date: Fri, 16 Mar 2012 17:09:42 +0000
From: Geoff Clare <gwc-7882/[email protected]>
To: austin-group-l-7882/[email protected]
Newsgroups: gmane.comp.standards.posix.austin.general
Subject: Re: Strange addressing issue in sed

Stephane Chazelas <[email protected]> wrote, on 16 Mar 2012:
>
> 2012-03-16 15:44:35 +0000, Geoff Clare:
> > I've been alerted to an odd behaviour of sed on certified UNIX
> > systems that doesn't seem to match the requirements of the
> > standard.  It concerns an interaction between the 'n' command
> > and address matching.
> > 
> > According to the standard, this command:
> > 
> > printf 'A\nB\nC\nD\n' | sed '1,3s/A/B/;1,3n;1,3s/B/C/'
> > 
> > should produce the output:
> > 
> > B
> > C
> > C
> > D
> > 
> > GNU sed does produce this, but certified UNIX systems produce this:
> > 
> > B
> > B
> > C
> > D
> > 
> > However, if I change the 1,3s/B/C/ to 2,3s/B/C/ then they produce
> > the expected output (tested on Solaris and HP-UX).
> > 
> > Is this just an obscure bug from common ancestor code, or is there
> > some legitimate reason why this address change alters the behaviour?
> [...]
> 
> I suppose the idea is that for the second 1,3cmd, line "1" has
> not been seen, so the 1,3 range is not entered.

Ah yes, now it makes sense, and it looks like the standard does
require this slightly strange behaviour, given how the processing
of the "two addresses" case is specified:

    An editing command with two addresses shall select the inclusive
    range from the first pattern space that matches the first address
    through the next pattern space that matches the second.  (If the
    second address is a number less than or equal to the line number
    first selected, only one line shall be selected.) Starting at the
    first line following the selected range, sed shall look again for
    the first address. Thereafter, the process shall be repeated.

It's specified this way because the addresses can be BREs, but if
the same matching process is applied to the line numbers (even though
they can only match at most once), then the 1,3 range on that last
command is never entered.

-- 
Geoff Clare <g.clare-7882/[email protected]>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

Jeff가 (나에게서) 인용한 메시지의 나머지 부분 중 관련 부분은 다음과 같습니다.

I suppose the idea is that for the second 1,3cmd, line "1" has
not been seen, so the 1,3 range is not entered.

Same idea as in

printf '%s\n' A B C | sed -n '1d;1,2p'

whose behavior differ in traditional (heirloom toolchest at
least) and GNU.

It's unclear to me whether POSIX wants one behavior or the
other.

따라서 (Geoff에 따르면) POSIX는분명한GNU는 불법적으로 행동합니다.

실제로 , 일관성이 덜합니다 seq 10 | sed -n '1d;1,2p'(seq 10 | sed -n '1d;/^1$/,2p'"이상한").

누구도 이것을 GNU 사람들에게 버그로 보고하고 싶어하지 않습니다. 이것을 버그라고 생각해야 할지 모르겠습니다. 아마도 가장 좋은 선택은 두 가지 동작을 모두 허용하도록 POSIX 사양을 업데이트하여 어느 쪽에도 의존할 수 없다는 점을 분명히 하는 것일 것입니다.

편집하다. 1970년대 후반 Unix V7의 원래 구현을 살펴보니 sed숫자 주소의 동작이 의도되지 않았거나 적어도 완전히 고려되지 않은 것 같습니다.

대신, Geoff가 사양을 읽은 후(그리고 그 일이 발생한 이유에 대한 원래 설명) 다음 위치에 있습니다.

seq 5 | sed -n '3d;1,3p'

라인 1, 2, 4, 5는 이번에는 1,3pranged 명령이 한 번도 만난 적이 없는 끝 주소이기 때문에 출력되어야 합니다.seq 5 | sed -n '3d;/1/,/3/p'

그러나 이것은 원래 구현에서는 발생하지 않으며 내가 시도한 다른 구현에서도 발생하지 않습니다(busybox는 sed버그처럼 보이는 행 1, 2 및 4를 반환합니다).

당신이 보면UNIX v7 코드, 현재 줄 번호가 다음인지 확인합니다.더 큰(번호) 끝 주소보다 범위가 벗어났습니다. 사실은시작 주소에 대해서는 이 작업을 수행하지 않습니다.의도한 디자인이라기 보다는 실수로 보입니다.

이는 현재 구현이 실제로 이 측면에 대한 POSIX 사양의 해석을 따르지 않는다는 것을 의미합니다.

GNU 구현의 또 다른 혼란스러운 동작은 다음과 같습니다.

$ seq 5 | sed -n '2d;2,/3/p'
3
4
5

2번째 줄은 건너뛰었기 때문에 2,/3/3번째 줄에 입력하세요(숫자가 2보다 큰 첫 번째 줄). 하지만 우리를 만드는 것은 바로 이 선입니다.입력하다범위, 확인되지 않음끝주소. 상황이 더욱 악화됩니다 busybox sed.

$ seq 10 | busybox sed -n '2,7d; 2,3p'
8

2~7행이 삭제되었으므로 8행은 첫 번째 행 >= 2이므로 2,3 범위는 다음과 같습니다.입력하다그 다음에!

실험 및 예시

실험 및 예시

답변1

관련 정보