<div class='date'></div><ul>...</ul>
다음과 같은 수천 개의 코드 블록이 포함된 HTML 파일이 있습니다 .
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
</body>
</html>
각 요소 <div>
와 해당 <ul>
요소는 특정 날짜에 따라 다릅니다. 블록은 <div class='date'></div><ul>...</ul>
오름차순으로 정렬됩니다. 즉, 최신 날짜가맨 아래파일. 최신 날짜가 되도록 내림차순으로 정렬할 계획입니다.맨 위파일의 내용은 다음과 같습니다.
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
올바른 도구가 무엇인지 잘 모르겠습니다. 쉘 스크립트입니까? 응 awk
? 파이썬인가요? 더 빠르고 편리한 것이 또 있을까요?
답변1
확장하다Python
해결책:
sort_html_by_date.py
스크립트:
from bs4 import BeautifulSoup
from datetime import datetime
with open('input.html') as html_doc: # replace with your actual html file name
soup = BeautifulSoup(html_doc, 'lxml')
divs = {}
for div in soup.find_all('div', 'date'):
divs[datetime.strptime(div.string, '%a %B %d %Y')] = \
str(div) + '\n' + div.find_next_sibling('ul').prettify()
soup.body.clear()
for el in sorted(divs, reverse=True):
soup.body.append(divs[el])
print(soup.prettify(formatter=None))
용법:
python sort_html_by_date.py
산출:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class="date">Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
사용된 모듈:
아름다운 수프-https://www.crummy.com/software/BeautifulSoup/bs4/doc/
약속 시간 -https://docs.python.org/3.3/library/datetime.html#module-datetime