HTML에서 값을 추출하는 Bash 스크립트

HTML에서 값을 추출하는 Bash 스크립트

HA7NET 1wire 디바이스 서버에서 카운터 값을 추출하려고 하는데 sed나 awk, bash 스크립트에 익숙하지 않아서 문제가 발생합니다.

스크립트는 카운터 ID가 포함된 배열을 제공합니다.

#!/bin/sh
Counters=$(curl -q "http://192.168.70.21/1Wire/Search.html?FamilyCode=1D" 2>/dev/null | sed --silent -e 's/.*<INPUT.*NAME="Address_\(.*\)".*VALUE="\(.*\)".*./\2/p')

# iterating by for to see the array.
for x in $Counters; do echo $x; done;

결과에는 다음과 같이 장치가 나열됩니다.

    D90000000C8A9A1D
    C00000000C8C9D1D
    2D0000000EE97D1D

이제 이 배열을 사용하여 카운터의 실제 판독값을 얻기 위해 다른 컬 요청을 만들고 싶습니다. 두 개의 카운터(장치당 2개의 카운터 A, B)에서 판독값을 가져오는 URL은 다음과 같이 한 번에 모든 장치를 읽도록 확장될 수 있습니다.

curl -q "http://192.168.70.21/1Wire/ReadCounter.html?Address_Channel_Array={D90000000C8A9A1D,A},{D90000000C8A9A1D,B},{C00000000C8C9D1D,A},{C00000000C8C9D1D,B},2D0000000EE97D1D,A},{2D0000000EE97D1D,B}" 

그리고 생성된 HTML 페이지는 다음과 같습니다.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><!-- InstanceBegin template="/Templates/1WireReply.dwt" codeOutsideHTMLIsLocked="false" -->
<head>
<!-- InstanceBeginEditable name="doctitle" -->
<title>Read Counter Reply</title><!-- InstanceEndEditable -->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<!-- InstanceBeginEditable name="head" --><!-- InstanceEndEditable -->
<style type="text/css">
<!--
@import url("/eds.css");
-->
</style>
<!-- InstanceParam name="pagePreprocessor" type="text" value="preProcessReadCounter" --><!-- InstanceParam name="functionname" type="text" value="Read Counter" --><!-- InstanceParam name="nextpage" type="text" value="PgReadCounterResult" --><!-- InstanceParam name="enctype" type="text" value="application/x-www-form-urlencoded" --><!-- InstanceParam name="name" type="text" value="Read Counter Result" -->
</head><body>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#EEEEEE">
<tr>
<td class="title" colspan="2"><h1>&nbsp;</h1><h1 class="title">Embedded Data Systems</h1><a class="title" href="http://www.embeddeddatasystems.com">http://www.embeddeddatasystems.com</a></td></tr><tr class="spacer">
<td><H2 class="spacer">Read Counter Reply</h2></td><td><p class="spacer">HA7Net: 1.0.0.22</p></td></tr><tr>
<td colspan="2"><FORM METHOD="POST" ACTION="/Forms/ReadCounterResult_1" name="Read Counter Result"><table name="Exceptions" ID="Exceptions">
<tr>
<td><INPUT CLASS="HA7Value" NAME="Exception_Code_0" ID="Exception_Code_0" TYPE="hidden" VALUE="0" Size="5" disabled></td><td><INPUT CLASS="HA7Value" NAME="Exception_String_0" ID="Exception_String_0" TYPE="hidden" VALUE="None" Size="5" disabled></td></tr></table><!-- InstanceBeginEditable name="WorkArea" -->
<table name="Counter" id="Counter">
<tr><td colspan=1>Address</td><td colspan=1>Count</td><td colspan=1>Status</td></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>
</table><!-- InstanceEndEditable -->
<table name="Statistics" ID="Statistics">

 

추가 사용을 위해 각 카운터의 값을 파일이나 변수로 추출하고 싶습니다.

나는 owfs를 사용하는 것이 가능하다는 것을 알고 있지만 그렇게 할 수 있는 유연성을 원합니다.

답변1

HTML은 정규 언어가 아니기 때문에 가장 먼저 주목해야 할 점은 정규식을 사용하여 HTML을 구문 분석하는 것은 광기의 첫 번째 선택 티켓이라는 것입니다. 즉, 씹어보려고 하는 HTML처럼 보입니다.~해야 한다매우 간단한 추출에 적합합니다. 추출하려는 데이터는 다음 행에서 나오는 것으로 보입니다.

<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>

따라서 우리는 아마도 sed다음과 같이 할 수 있을 것입니다:

$ sed -n '/HA7Value.*Address_/{ s/VALUE="/%%%/;s/^.*%%%//; s/".*//; p; }' input.html
D90000000C8A9A1D
D90000000C8A9A1D
C00000000C8C9D1D
C00000000C8C9D1D
2D0000000EE97D1D
2D0000000EE97D1D

이에 대해 자세히 설명하면 다음과 같습니다.

/HA7Value.*Address_/ # Only run on lines that match this expression
{                    # Begin code block
  s/VALUE="/%%%/     # Replace (only) the first 'VALUE="' with a special marker
  s/^.*%%%//         # Delete everything up to that marker
  s/".*//            # Delete from the first '"' to the end of the line
  p                  # Print what's left
}                    # End code block

관련 정보