지정된 대상을 PDF로 나열

Question 1

포플러의 PDF 정보명령줄 유틸리티는 PDF에 지정된 모든 대상의 페이지 번호, 위치 및 이름을 제공합니다. Poppler 버전 0.58 이상이 필요합니다.

$ pdfinfo -dests input.pdf
Page  Destination                 Name
   1 [ XYZ null null null      ] "F1"
   1 [ XYZ  122  458 null      ] "G1.1500945"
   1 [ XYZ   79  107 null      ] "G1.1500953"
   1 [ XYZ   79   81 null      ] "G1.1500954"
   1 [ XYZ null null null      ] "P.1"
   2 [ XYZ null null null      ] "L1"
   2 [ XYZ null null null      ] "P.2"
(...)

Answer

포플러의 PDF 정보명령줄 유틸리티는 PDF에 지정된 모든 대상의 페이지 번호, 위치 및 이름을 제공합니다. Poppler 버전 0.58 이상이 필요합니다.

$ pdfinfo -dests input.pdf
Page  Destination                 Name
   1 [ XYZ null null null      ] "F1"
   1 [ XYZ  122  458 null      ] "G1.1500945"
   1 [ XYZ   79  107 null      ] "G1.1500953"
   1 [ XYZ   79   81 null      ] "G1.1500954"
   1 [ XYZ null null null      ] "P.1"
   2 [ XYZ null null null      ] "L1"
   2 [ XYZ null null null      ] "P.2"
(...)

Question 2

이것pyPDF라이브러리는 앵커 포인트를 나열할 수 있습니다.

#!/usr/bin/env python
import sys
from pyPdf import PdfFileReader
def pdf_list_anchors(fh):
    reader = PdfFileReader(fh)
    destinations = reader.getNamedDestinations()
    for name in destinations:
        print name
pdf_list_anchors(open(sys.argv[1]))

이는 사용 사례를 완료하는 데 충분하지만 앵커는 무작위 순서로 나열됩니다. pyPdf 1.13에는 안정적인 인터페이스만 있고 앵커를 순서대로 나열하는 방법을 찾을 수 없습니다. 아직 pyPdf2를 시도하지 않았습니다.

Answer

이것pyPDF라이브러리는 앵커 포인트를 나열할 수 있습니다.

#!/usr/bin/env python
import sys
from pyPdf import PdfFileReader
def pdf_list_anchors(fh):
    reader = PdfFileReader(fh)
    destinations = reader.getNamedDestinations()
    for name in destinations:
        print name
pdf_list_anchors(open(sys.argv[1]))

이는 사용 사례를 완료하는 데 충분하지만 앵커는 무작위 순서로 나열됩니다. pyPdf 1.13에는 안정적인 인터페이스만 있고 앵커를 순서대로 나열하는 방법을 찾을 수 없습니다. 아직 pyPdf2를 시도하지 않았습니다.

Question 3

(여기에도 답변이 있습니다:PDF 문서의 기준점 보기)

나는 같은 문제가 있었고 마침내 다음을 통해 훌륭한 답변을 찾았습니다.PDF 구조를 시각적으로 검사하여 리버스 엔지니어링하는 방법은 무엇입니까?

대답은 Python 패키지를 사용하는 것입니다 pdfminer.six. 균일하다문서의 예 중 하나입니다!다음 코드를 잘라내어 터미널에 붙여넣으세요.

pip install pdfminer.six
cat >extract.py <<EOF
import sys
import pdfminer.pdfparser, pdfminer.pdfdocument
with open(sys.argv[1], "rb") as f:
  parser = pdfminer.pdfparser.PDFParser(f)
  document = pdfminer.pdfdocument.PDFDocument(parser)
  for (level, title, dest, a, se) in document.get_outlines():
    print('  ' * level, title, dest or a, se)
EOF
python extract.py myInputFile.pdf

내 특정 PDF에서 출력은 다음과 같습니다.

$ python extract.py ~/Desktop/p2786r3.pdf | head
   Abstract {'S': /'GoTo', 'D': b'section.1'} None
   Revision History {'S': /'GoTo', 'D': b'section.2'} None
     R3: October 2023 (midterm mailing)r3-october-2023-midterm-mailing {'S': /'GoTo', 'D': b'section*.2'} None
     R2: June 2023 (Varna meeting)r2-june-2023-varna-meeting {'S': /'GoTo', 'D': b'section*.3'} None
     R1: May 2023 (pre-Varna mailing)r1-may-2023-pre-varna-mailing {'S': /'GoTo', 'D': b'section*.4'} None
     R0: Issaquah 2023r0-issaquah-2023 {'S': /'GoTo', 'D': b'section*.5'} None
   Introduction {'S': /'GoTo', 'D': b'section.3'} None
   Motivating Use Cases {'S': /'GoTo', 'D': b'section.4'} None
     Efficient vector growth {'S': /'GoTo', 'D': b'subsection.4.1'} None
     Moving types without empty states {'S': /'GoTo', 'D': b'subsection.4.2'} None

실제로 p2786r3.pdf#subsection.4.2브라우저에서 해당 특정 섹션으로 이동하면 PDF가 열립니다.

Answer