鳿ºãæã«å ¥ãã
ãã¬ãŒãã³ã°ã«ã¯ãæ¬ç©ã®ç¡å£å鳿ºïŒã€ãŸã pcm_f32le ãã©ãŒãããã®é³å£°ãã¡ã€ã«ïŒã䜿ãã®ããããããCDãªã©ã®ç©çã¢ã«ãã ã賌å
¥ããŠã€ã³ããŒãããã®ãäžçªã ããããæ¹ã¯
https://www.bilibili.com/opus/925630344961458181
ãåèã«ããŠã¿ãŠããã ããflacã«ã¯å€æããã«ãã®ãŸãŸwav圢åŒã§åºåãããã
ãã®èšäºã§ã¯RVCããŒã¹ã®Applioã䜿ããããã¬ãŒãã³ã°ã«ã¯ãè¯ãçµæãåŸãããã«ã ããã10åã60åã®ã¯ãªãŒã³ãªããŒã«ã«é³æºïŒä¹Ÿå£°/ãã©ã€ããã«ïŒãå¿ èŠã«ãªããã§ããé·ããŠã2æéã¯è¶ ããªãããã«ããããããšãéãã質ã倧äºïŒãGarbage In, Garbage OutïŒãŽããå ¥åããŠããŽãããåºãŠããªãïŒãã ãããã
Youtube
YouTubeãã鳿ºãååŸããæ¹æ³ã«ã€ããŠã ãã©ãyt-dlp ã䜿ã£ãŠããŠã³ããŒãã§ããããã ãããããååŸãã鳿ºã¯æå£åãã©ãŒãããããã®å€æã«ãªããããæ¬ç©ã®ç¡å£å鳿ºã§ã¯ãªããã
Github: yt-dlp/yt-dlp
æé«é³è³ªã®YouTubeåç»ããé³å£°ãããŠã³ããŒãããã³ãã³ãïŒ
| |
ããé·ãã®é³å£°ïŒ10å以äžïŒãªããffmpegã䜿ã£ãŠåå²ã§ãããã
| |
ããã§ã¯10åïŒ600ç§ïŒããšã«åå²ããŠãããã©ãå¥ã®æéã«ãããªã
-segment_time 600ã®æ°å€ã倿ŽããŠãã
äžã€ã®ãã¡ã€ã«ã«çµåããã«ã¯ããŸã filelist.txt ãäœã£ãŠããã¹ãŠã®ããŒãããªã¹ãã¢ããããã
| |
ããããããã®ã³ãã³ãã§çµåãããã
| |
ç¹å®ã®éšåã ãããããã«åãåºããããšããäŸãã°10åããå§ããŠ15åéåãåãåºãããå Žåã¯ããããïŒ
| |
ãã®ä»ã®ãœãŒã¹
ä»ã®ãœãŒã¹ããå
¥æãã flac ã mp3 ãªã©ã®ãã©ãŒãããã®é³æ¥œã¯ãffmpegã䜿ã£ãŠæšè«çšã«ãã©ãŒããã倿ãããã
| |
ã§ããæå£åãã©ãŒãããããç¡å£åãã©ãŒãããã«å€æãããããšãã£ãŠãæ²èªäœããã¹ã¬ã¹ã«ãªããããããªãããã£ã±ãç©çã¢ã«ãã ããçŽæ¥åãåºããwav鳿ºã䜿ãã®ãäžçªããããã
ããŒã«ã«ïŒä¹Ÿå£°ïŒã®æœåº
ç°å¢ã®æ§ç¯
ãŸãã¯MiniCondaãã€ã³ã¹ããŒã«ããããç°å¢ãäœæããŠã¢ã¯ãã£ããŒããããã
| |
PyTorchã®ã€ã³ã¹ããŒã«ã
| |
openmirlab/bs-roformer-infer ã®ã€ã³ã¹ããŒã«ã
| |
nomadkaraoke/python-audio-separator ã®ã€ã³ã¹ããŒã«ã
| |
ã¢ãã«ã®ããŠã³ããŒã
次ã®ã³ãã³ãã§ãçŸåšå©çšå¯èœãªã¢ãã«ã確èªã§ãããã
| |
2026幎5æ21æ¥æç¹ã§ã® bs-roformer-download ã®åºåã¯ãããªæãïŒ
| |
åã¢ãã«ã®äž»ãªçšéã¯ãããªæãïŒ
- BS Roformer SW by jarredou: bassãdrumsãguitarãinstrumentalãotherãpianoãvocalsã®7ã€ã®é³è»ïŒãã«ããã©ãã¯ïŒã«åé¢ã§ããã
- BS Roformer | Chorus Male-Female by Sucial: ç·å£°ã女声ããããŠã³ãŒã©ã¹ãåé¢ã§ããã
- BS Roformer | Instrumental Resurrection by unwa: 䌎å¥çšã®è¶ é«é³è³ªåæ§ç¯ã¢ãã«ãå€ãæ²ã®äŒŽå¥ã修埩ããã
- BS Roformer | Male-Female by aufr33: ç·å£°ãšå¥³å£°ãåé¢ããã
- BS Roformer | Vocals Resurrection by unwa: ããŒã«ã«ã®è¶ é«é³è³ªåæ§ç¯ã¢ãã«ãReviveã®ããã¢ã°ã¬ãã·ããªããŒãžã§ã³ã§ãå€ãæ²ã®ããŒã«ã«ã修埩ããã
- BS Roformer | Vocals Revive Series: ããŒã«ã«ä¿®åŸ©ã¢ãã«ãç»åã®ãè¶ è§£åãã¿ãããªæèŠã
- BS Roformer | Vocals by Gabox: æšæºçãªããŒã«ã«æœåºã¢ãã«ãæåã®SWã¢ãã«ã7ã€ã®ãã©ãã¯ã«åããã®ã«å¯ŸããŠãããã¯äŒŽå¥ãšããŒã«ã«ã®2ã€ã ãã«åé¢ããã
- BS-Roformer-De-Reverb: ããªããŒãïŒæ®é¿é€å»ïŒã
å¿ èŠãªã¢ãã«ãããŠã³ããŒãããããããã§ã¯2ã€ã®ã¢ãã«ãããŠã³ããŒããããã©ãäžéšããŠã³ããŒããªã³ã¯ãåããŠãããããçæ¹ã¯å¥éããŠã³ããŒãããå¿ èŠããããã
| |
ããªããŒãïŒæ®é¿é€å»ïŒã¢ãã«ã¯ huggingface.co ã§èŠã€ããããšãã§ãããããªã³ã¯ã¯ãã¡ãïŒ anvuew/dereverb_bs_roformer
ã¢ãã«ãã¡ã€ã«ãšèšå®çšã® yaml ãã¡ã€ã«ãããŠã³ããŒãããŠã察å¿ãããã©ã«ãã«é
眮ããŠããèšå®ãã¡ã€ã«ã® yaml ã¯ã¢ãã«ãšåããã¡ã€ã«åã«ãªããŒã ããŠããã®ãããããã
é³å£°ã®åŠç
ããã§ã¯3ã€ã®ã¹ãããã§åŠçãè¡ããããŸã BS Roformer SW by jarredou ã䜿ã£ãŠããŒã«ã«ãåé¢ããæ¬¡ã« Roformer Model: MelBand Roformer | Karaoke V2 by Gabox ã§ã³ãŒã©ã¹ãåé¢ãæåŸã« Roformer Model: BS-Roformer-De-Reverb ã§æ®é¿ãé€å»ããŠãæçµçãªã¯ãªãŒã³ãªããŒã«ã«ïŒä¹Ÿå£°ïŒãæã«å
¥ããã
ãã¡ããã2çªç®ãš3çªç®ã®ã¹ãããã¯é åºãå ¥ãæ¿ããŠè©ŠããŠã¿ãã®ããããæ²ã«ãã£ãŠæé©ãªçµã¿åãããå€ãããããã
ãŸãã¯å ¥åçšãšåºåçšã®ãã©ã«ããäœæããããä»åã¯3ã¹ããããããããããããã®ã¹ãããçšã®ãã©ã«ããçšæãããã
| |
step1_input ã«é³æ¥œãã¡ã€ã«ãå
¥ãããïŒãã¡ã€ã«åã¯è±èªæšå¥šïŒãå¿
ãç¡å£åãã¡ã€ã«ã®wav圢åŒã䜿ã£ãŠãã
<1> ãã«ããã©ãã¯åé¢
BS Roformer SW by jarredou ã䜿ã£ãŠãã«ããã©ãã¯ã«åè§£ãããã¢ãã«ã® yaml ãã¡ã€ã«ãç·šéããŠãé¢é£ããèšå®ã远å ããŠãïŒãã¹ã¯ models/roformer-model-bs-roformer-sw-by-jarredouïŒã
| |
ãããã®èšå®é ç®ã¯ãã·ã³ã®ã¹ããã¯ã«åãããŠèª¿æŽããŠãã
- batch_size: VRAMïŒãããªã¡ã¢ãªïŒã®å®¹éã«åãããŠéžã¶ãã¡ã¢ãªãå€ãã»ã©æ°å€ã倧ããã§ãããäŸãã°16GBã®VRAMãªã16ã«èšå®ããã
- dim_t: 倿Žããªãã§ããããã¯ã¢ãã«èšç·Žæã®æéé åã®æ¬¡å æ°ã ãã
- chunk_size: ããŠã³ããŒãããèšå®ã«ãã®é
ç®ããªãå Žåãããã®ã§ãäžã®äŸã®ããã«
352768ã远å ããŠå ¥åããŠãã - num_overlap: VRAMã®å®¹éã«åãããŠ2ã10ã®éã§éžã¶ã
- normalize: falseã®ãŸãŸã«ããŠããã
æšè«ãéå§ãããïŒ
| |
æšè«ãçµãã£ããããã¡ã€ã«åã _vocals.wav ã§çµãããã¡ã€ã«ã2çªç®ã®ã¹ãããã®å
¥åãã©ã«ã step2_input ã«ç§»åãããŠãã
(optional) MSSTãšãã¬ã³ãããŠã¯ãªãªãã£ãããã«äžãã
æ®éã¯ã©ããäžã€ã ãã§ãååãªå¹æãåŸããããã©ã究極ã®ã¯ãªãªãã£ãç®æããªããMSST-BSRNNãäžåºŠèµ°ãããŠããŒã«ã«é³æº vocals_msst.wav ãäœãããããRoformerã§åºåãã vocals_roformer.wav ãšã¹ãã¬ãªã§èåïŒãã¬ã³ãïŒãããæ¹æ³ãããããã®ãã¬ã³ããããã¡ã€ã«ã䜿ã£ãŠæ¬¡ã®ã¹ãããã«é²ããã ãèåã«ã¯ ffmpeg ãçŽæ¥äœ¿ãããã
| |
ãã©ã¡ãŒã¿ã®ç°¡åãªèª¬æïŒ
normalize=0: é³å£°ã®ãã£ããŒã«ãé³éãæ¥æ¿ã«å€åããã®ãé²ããããåçãªãŒãããªã¥ãŒã 調æŽããªãã«ããã-c:a pcm_s16le: ãã¹ã¬ã¹ïŒç¡å£åïŒåºåã
<2> ããŒã«ã«ã®æœåºã»çŽå
ã³ãŒã©ã¹ãåé¢ããããã®ã¢ãã«ã¯ãã¢ãŒããã¯ãã£ãç°ãªããã audio-separator ã䜿ãããã®ããŒã«ãå®è¡ãããšèªåã§ããŠã³ããŒãããããã
| |
æšè«ãçµãããšãVocals ãã¡ã€ã³ããŒã«ã«ãInstrumental ãã³ãŒã©ã¹ïŒãã¢ãªïŒã«ãªããVocals ãã¡ã€ã«ã3çªç®ã®å
¥åãã©ã«ã step3_input ã«å
¥ãããã
<3> 空éã®ã¯ã¬ã³ãžã³ã°ïŒããªããŒãïŒ
BS-Roformer-De-Reverb ã䜿ã£ãŠããŒã«ã«ã®æ®é¿ïŒãªããŒãïŒãåãé€ãããyaml ã®èšå®ã調æŽã§ããã
| |
dim_t ã¯å€æŽãããä»ã®2ã€ã®é
ç®ã¯VRAMã«åãããŠå€æŽããŠãã
æšè«ãéå§ïŒ
| |
æšè«ãçµãããšã_noreverb.wav ãšããååã®ãã¡ã€ã«ãã§ããããããããæçµçãªã¯ãªãŒã³ãªããŒã«ã«é³æºïŒä¹Ÿå£°ïŒã ãã
(optional) ç·å£°ãšå¥³å£°ãåé¢ãã
ããç·å¥³ã®ãã¥ãšããæ²ãªããå
ã« BS Roformer | Chorus Male-Female by Sucial ã䜿ã£ãŠå£°ãåé¢ããŠããæœåºãããšãããïŒãã®ã¢ãã«ã¯
Sucial/Chorus_Male_Female_BS_Roformer
ããããŠã³ããŒãã§ããïŒã
| |
ãã€ã¹ãã§ã³ãžïŒå€å£°ïŒ
ApplioïŒRVCã¢ãŒããã¯ãã£ïŒã䜿ã£ãŠãã€ã¹ãã§ã³ãžåŠçãè¡ããã
ãããžã§ã¯ãã®URL: IAHispano/Applio
ã€ã³ã¹ããŒã«
ã€ã³ã¹ããŒã«ã¯ãããç°¡åããœãŒã¹ã³ãŒããã¯ããŒã³ãããã
| |
ã«ãŒããã£ã¬ã¯ããªã«ç§»åã㊠run-install.bat ãå®è¡ããã°ãã€ã³ã¹ããŒã«ãå§ãŸããã
ã€ã³ã¹ããŒã«ãçµãã£ãããrun-applio.bat ãå®è¡ããŠèµ·åãããã
ã¢ãã«ã®ãã¬ãŒãã³ã°
ãTrainingãã¿ããéžãã§ãModel Settingsã§æ°ãããModel Nameããäœæããã
次ã«ãPreprocessïŒååŠçïŒãã¢ãžã¥ãŒã«ã§æ°ããããŒã¿ã»ãããäœæããããã€ãã®ã¯ãªãŒã³ãªããŒã«ã«é³æºïŒä¹Ÿå£°ïŒãã¢ããããŒããããã¢ããããŒããçµãã£ãã Preprocess Dataset ãã¯ãªãã¯ãããã
ãExtractãã¢ãžã¥ãŒã«ã¯ããã©ã«ãã®ãŸãŸã§OKãExtract Features ãã¯ãªãã¯ã
ãTrainingãã¢ãžã¥ãŒã«ã§ã¯ãVRAMã®å®¹éã«åãããŠBatch Sizeã調æŽããŠãã
- 8G: 4 ãŸã㯠8
- 12-16G: 12-16
- 24G-: 24-32
ãSave Every Epochãã¯ããã©ã«ãã®10ã®ãŸãŸã§å€§äžå€«ã
ãTotal Epochãã¯200ã300ããããããäžè¬çãªããŒã¿ã»ãããªãããã®ãããã®ãšããã¯æ°ã§è¯ãçµæãåºããïŒã ããã220ã250ãšããã¯ããããäžçªè¯ãã¢ãã«ã«ãªãããšãå€ãããããåºæºã«å¢ããããæžããããããŠã¿ãŠãïŒã
調æŽãçµãã£ããèŠçŽã«åæããŠããã¬ãŒãã³ã°ãéå§ïŒStart TrainingïŒãããããã®åŸãã€ã³ããã¯ã¹ãäœæïŒGenerate IndexïŒãããã
æšè«ïŒãã€ã¹ãã§ã³ãžã®é©çšïŒ
ã¢ãã«ã®ãã¬ãŒãã³ã°ãçµãã£ããããInferenceïŒæšè«ïŒãã¿ãã§ãã€ã¹ãã§ã³ãžãè¡ãããã¢ãã«ãéžæãããããŸãã¯200ã250ãšããã¯ã®ã¢ãã«ã§ãã¹ãããŠã¿ãŠãããããå¿ èŠã«å¿ããŠãšããã¯æ°ãäžãäžãããã®ããããã
ãAdvanced SettingsïŒè©³çްèšå®ïŒãã®åãªãã·ã§ã³ã«ã€ããŠïŒ
- Split Audio: é³å£°ãã¡ã€ã«ãåå²ããèšå®ãé·ãé³å£°ã®å Žåã¯VRAM溢ããé²ãããã«ãã§ãã¯ãå ¥ããã3åçšåºŠã®çãæ²ãªãããã§ãã¯ãå€ããã»ããã¯ãªãªãã£ãé«ããªãããã
- Autotune(èªåšçµé³/ä¿®é³): ãªãŒããã¥ãŒã³ïŒèªåãããè£æ£/ã±ãã±ãïŒãæã®å Žåã¯ãã§ãã¯ãå ¥ããŠãèªããç¬ãèšã®å Žåã¯å€ããŠãããã
- Clean Audio(é³é¢éåª): ãã€ãºé€å»ãç¶æ³ã«åãããŠãã§ãã¯ãå ¥ããŠãã
ãã®äžã®èª¿æŽã¹ã©ã€ããŒïŒ
- Pitch(é³é«): ããŒèª¿æŽãç·å£°ã女声ã«ãããªã +12ã女声ãç·å£°ã«ãããªã -12ãåããªã 0 ã«ã
- Search Feature Ratio(æ£çŽ¢ç¹åŸå æ¯/玢åŒç): ç¹åŸŽéæ€çŽ¢æ¯çïŒã€ã³ããã¯ã¹æ¯çïŒãæãªã0.7ã0.8ãããããã£ã¹ããé·ãèªããªã0.6ã0.7ãããããã
- Protect Voiceless Consonants(ä¿æ€æž èŸ é³ååŒåžå£°): ç¡å£°é³ããã¬ã¹é³ã®ä¿è·ãæãªã0.33ååŸã0.5ããã®ä»ãªã0.5ã§èª¿æŽããŠã¿ãŠã
調æŽãçµãã£ããèŠçŽã«åæããŠã倿ïŒConvertïŒãéå§ãããã
(ãªãã·ã§ã³) ã¹ãã¬ãªé³å£°ã§ã®æšè«
Applioã¯ã¢ãã©ã«åºåããã§ããªããããã¹ãã¬ãªé³å£°ãå ¥åãããšä»äžããããããäžèªç¶ã«ãªã£ã¡ãããã ãã ããããŸãã¯ãã£ã³ãã«ãåé¢ããŠããæšè«ããŠããã®ããšå床çµåããæ¹æ³ããšããã
ffmpeg ã䜿ã£ãŠãã£ã³ãã«ãåé¢ããã
| |
ããããåå¥ã«æšè«ããããšãã¹ãã¬ãªã«åçµåããã
| |
(ãããã質å) ããŒãç«¶ååé¡
ããããŒãã䜿çšäžïŒå æïŒãšè¡šç€ºãããå Žåã¯ãããã°ã©ã ã®ã«ãŒããã£ã¬ã¯ããªã«ãã app.py ãéããŠãDEFAULT_PORT = 6969 ãå¥ã®æ°å€ã«å€æŽãããã
Windowsã®äºçŽæžã¿ããŒãã¯é¿ããããã«ããŠãããããã¯PowerShellã§æ¬¡ã®ã³ãã³ããå®è¡ãããšç¢ºèªã§ãããã
| |
ããã·ã³ã°
ãã€ã¹ãã§ã³ãžããããŒã«ã«é³æºãšãæåã® ãã«ããã©ãã¯åé¢ ã¹ãããã§åãã䌎å¥ïŒã€ã³ã¹ãïŒãåãããã°å®æã ãã
ã³ãŒã©ã¹ãåé¢ããŠããããããŸãã¯2çªç®ã®ã¹ãããã§åããã³ãŒã©ã¹ãšæåã®ã¹ãããã®äŒŽå¥ãããã¯ã¹ããŠãæ°ãã䌎å¥ããäœãããã®åŸã« ffmpeg ã§ããŒã«ã«ãšæ°ãã䌎å¥ãããã¯ã¹ããŠã¿ãã®ããããããã³ãã³ãã¯åãã§ã以äžã®ããã«ãªãïŒ
| |
ãã¡ãããå®å šã«ã¯ãªãŒã³ãªããŒã«ã«ïŒãã©ã€é³ïŒã®ãŸãŸã ãšå°ãç©è¶³ããªãããããªããŒãïŒãšã³ãŒïŒãå°ãå ãããšããè¯ããªããã
| |
aecho=0.8:0.88:40:0.4 ã®ãã©ã¡ãŒã¿ã®æå³ã¯ãããªæãïŒ
0.8: In Gainãå ¥åé³éãã€ãŸããšãã§ã¯ã¿ãŒã«å ¥ãåã®ããŒã«ã«ã®é³éã0.88: Out Gainãåºåé³éããªããŒããããã£ãåŸã®å šäœã®é³éã40: Delaysãé å»¶æéãé³ãå£ã«åå°ããŠè¿ã£ãŠãããŸã§ã®æéã ãã0.4: Decaysãæžè¡°ä¿æ°ãé³ã«å¿å°ããäœé»ïŒæ®é¿ã®æ¶ãæ¹ïŒãæãããèšå®ã
ãã®èšå®ã¯ãã·ã³ãã«ãªç°¡æã¹ã¿ãžãªãã®ãããªé¿ãã«ãªããããã£ãšã¹ããŒãžã£ãœãåºãæãã«ããããªã aecho=0.8:0.88:80:0.5 ã«ãã»ãã®å°ã埮調æŽããããããªã aecho=0.8:0.88:35:0.25 ã䜿ã£ãŠã¿ãŠãã
äžã®äŸã¯åçãªããã¯ã¹ã ãã©ãããããã®é³éãå¥ã ã«èª¿æŽããŠããã¯ã¹ããããšãã§ãããã
| |
ããã§ã¯æåã«å
¥åããé³å£°ïŒããŒã«ã«ïŒã100%ã®é³é [0:a]volume=1.0[v]ã2çªç®ã®é³å£°ïŒäŒŽå¥ïŒã40%ã®é³é [1:a]volume=0.4[b] ã«èšå®ããŠãããã
ãªããŒããããã€ã€ã䌎å¥ã®é³éã調æŽãããå Žåã¯ãããªãïŒ
| |
ãããã«
ãããããå®å šèªåïŒæåã®èª¿æŽãªãïŒãã§ã®åŠçã¯ãã ããã3ã€ãããã®ã¢ãã«ã§è©ŠããŠã¿ããã ãã©ãæ£çŽãããŸã§å®ç§ãªçµæã«ã¯ãªããªãã£ãããã£ã±ãäžéšã®é³æºã«ã€ããŠã¯ãå°ãæäœæ¥ã§æãå ¥ããŠèª¿æŽããŠãããã»ããã¯ãªãªãã£ã¯äžãããšæãã
ããšãããã§ç޹ä»ãã bs-roformer-infer ã¯å°ãå€ãããŒã«ã ãããã¢ãã«ãå°ãªãã£ããããªã³ã¯åãã§ããŠã³ããŒãã§ããªãã£ããããããšãå€ããã ãããã¿ã€ãã³ã°ãããã°ãä»åºŠã¯ãã£ãšæ°ããããŒã«ã詊ããŠã¿ãŠããŸãèšäºã«ãŸãšããŠå
¬éãããã
ä»åã®èšäºã¯æ±ºããŠå®ç§ãªãã®ã§ã¯ãªãããã©ãèªåã®å匷ã®èšé²ãšããŠæ®ããŠããããããã°ã£ãŠããããããããã®ãæžãæ®ããŠããããã®å Žæã ããïŒ