grep을 사용하여 div 콘텐츠를 추출하는 방법은 무엇입니까?

Question

사용grep -A

$ grep -A 2 'class="col-6"' test.html | sed -n 2p
        <p>One of three columns</p>

에서 man grep:

-A NUM, 일치하는 줄 뒤에 후행 컨텍스트 줄을 --after-context=NUM
인쇄합니다 .NUM

또는 다음을 사용하십시오 awk.

$ awk '/class="col-6"/{getline; print $0}' test.html
        <p>One of three columns</p>

참고: 이 방법은 구조가 테스트 입력과 정확히 동일한 경우에만 작동합니다. 일반적으로 말하면 나는 할 것이다언제나적절한 xml/html 파서를 선호하세요.

예를 들어 python:beautifulsoup

$ python3 -c '
from bs4 import BeautifulSoup
with open("test.html") as fp:
    soup = BeautifulSoup(fp)
print(soup.findAll("div", {"class":"col-6"})[0].findAll("p")[0])'
<p>One of three columns</p>

아니면 xmlstarlet다음과 같이 사용하세요:

$ xmlstarlet sel -t -m '//div[@class="col-6"]' -c './p' -n test.html
<p>One of three columns</p>

Answer 1

사용grep -A

$ grep -A 2 'class="col-6"' test.html | sed -n 2p
        <p>One of three columns</p>

에서 man grep:

-A NUM, 일치하는 줄 뒤에 후행 컨텍스트 줄을 --after-context=NUM
인쇄합니다 .NUM

또는 다음을 사용하십시오 awk.

$ awk '/class="col-6"/{getline; print $0}' test.html
        <p>One of three columns</p>

참고: 이 방법은 구조가 테스트 입력과 정확히 동일한 경우에만 작동합니다. 일반적으로 말하면 나는 할 것이다언제나적절한 xml/html 파서를 선호하세요.

예를 들어 python:beautifulsoup

$ python3 -c '
from bs4 import BeautifulSoup
with open("test.html") as fp:
    soup = BeautifulSoup(fp)
print(soup.findAll("div", {"class":"col-6"})[0].findAll("p")[0])'
<p>One of three columns</p>

아니면 xmlstarlet다음과 같이 사용하세요:

$ xmlstarlet sel -t -m '//div[@class="col-6"]' -c './p' -n test.html
<p>One of three columns</p>

grep을 사용하여 div 콘텐츠를 추출하는 방법은 무엇입니까?

답변1

관련 정보