Readwise Reader 中无法识别中文批注版本 Epub 中的行内批注,为了更好的在 Readwise Reader 中阅读,以及当初在 Calibre 中阅读时批注,不好将批注内容很好的分地复制出来,所以需要将批注的格式进行修改。
通过正则匹配替换内容,保留批注正文。
Find Regex | Replace Regex |
---|---|
<span class="xiu"><span class="ord">绣<span class="ord0">旁<\/span><\/span>(.*?)<\/span> |
「绣旁:\1」 |
<span class="xiu"><span class="ord">绣<span class="ord0">眉<\/span><\/span>(.*?)<\/span> |
「绣眉:\1」 |
<span class="xiu"><span class="ord">绣<span class="ord0">夹<\/span><\/span>(.*?)<\/span> |
「绣夹:\1」 |
<span class="jia"><span class="ord">张<span class="ord0">旁<\/span><\/span>(.*?)<\/span> |
「张旁:\1」 |
<span class="jia"><span class="ord">张<span class="ord0">眉<\/span><\/span>(.*?)<\/span> |
「张眉:\1」 |
<span class="jia"><span class="ord">张<span class="ord0">夹<\/span><\/span>(.*?)<\/span> |
「张夹:\1」 |
<div class="quote">\s(<p>)<span class="ord">张<span class="ord0">回<\/span><\/span> |
\1「张回: |
(<\/p>)\s<\/div>(\s<p>词曰) |
」\1\2 |
(<\/p>)\s<\/div>(\s<p>诗曰) |
」\1\2 |
<div class="wenlong">\s(<p>)<span class="ord">文<span class="ord0">回<\/span><\/span> |
\1「文回: |
(<\/p>\s)<\/div>\s(<\/body>) |
」\1\2 |
这里主要用到的就是 ()
内的内容可以用 \
加上数字来表示,进行保留。
Find Regex | Replace Regex |
---|---|
<span class="kt"><img alt="庚辰本" class="font_patch" src="../Images/image00844.gif"/>(.*?)</span> |
「庚:\1」 |
<span class="kt"><img alt="甲戌本" class="font_patch" src="../Images/image00842.gif"/>(.*?)</span> |
「甲:\1」 |
<span class="kt"><img alt="戚序本" class="font_patch" src="../Images/image00845.gif"/>(.*?)</span> |
「戚:\1」 |
<span class="kt"><img alt="己卯本" class="font_patch" src="../Images/image00843.gif"/>(.*?)</span> |
「己:\1」 |
<span class="kt"><span class="red"><img alt="庚辰本" class="font_patch" src="../Images/image00844.gif"/>(.*?)</span></span> |
「庚:\1」 |
<img alt="甲戌本" class="font_patch" src="../Images/image00842.gif"/><img alt="侧批" class="font_patch" src="../Images/image00851.gif"/>((?!img).*?)(</span>) |
「甲侧:\1」\2 |
<img alt="庚辰本" class="font_patch" src="../Images/image00844.gif"/><img alt="侧批" class="font_patch" src="../Images/image00851.gif"/>((?!img).*?)(</span>) |
「庚侧:\1」\2 |
<img alt="甲戌本" class="font_patch" src="../Images/image00842.gif"/><img alt="眉批" class="font_patch" src="../Images/image00850.gif"/>((?!img).*?)(</span>) |
「甲眉:\1」\2 |
<img alt="甲戌本" class="font_patch" src="../Images/image00842.gif"/><img alt="夹批" class="font_patch" src="../Images/image00852.gif"/>((?!img).*?)(</span>) |
「甲夹:\1」\2 |
<img alt="庚辰本" class="font_patch" src="../Images/image00844.gif"/><img alt="眉批" class="font_patch" src="../Images/image00850.gif"/>((?!img).*?)(</span>) |
「庚眉:\1」\2 |
<img alt="蒙府本" class="font_patch" src="../Images/image00846.gif"/><img alt="侧批" class="font_patch" src="../Images/image00851.gif"/>((?!img).*?)(</span>) |
「蒙侧:\1」\2 |
<img alt="庚辰本" class="font_patch" src="../Images/image00844.gif"/><img alt="夹批" class="font_patch" src="../Images/image00852.gif"/>((?!img).*?)(</span>) |
「庚夹:\1」\2 |
<img alt="戚序本" class="font_patch" src="../Images/image00845.gif"/><img alt="夹批" class="font_patch" src="../Images/image00852.gif"/>((?!img).*?)(</span>) |
「戚夹:\1」\2 |
<span class="small kt red">((?!span).*?)</span> |
\1 |
<span class="x-small">((?!span).*?)</span> |
\1 |
<span class="small kt">((?!span).*?)</span> |
\1 |
<span class="red">((?!span).*?)</span> |
\1 |
<img alt="戚序本" class="font_patch" src="../Images/image00845.gif"/><img alt="夹批" class="font_patch" src="../Images/image00852.gif"/>((?!img).*?)(</p>) |
「戚夹:\1」\2 |
<img alt="甲辰本" class="font_patch" src="../Images/image00849.gif"/><img alt="夹批" class="font_patch" src="../Images/image00852.gif"/>((?!img).*?)(</p>) |
「甲夹:\1」\2 |
这里的 (?!span)
是防止出现 <span></span>
相互嵌套,导致匹配的内容并不是一个成对 HTML 标签的情况,也就是不允许 <span></span>
中存在额外的 <span>
标签。
红楼梦批注的 HTML 结构很乱,余者手动校对。